Virus linguistics - searching for ethnic words

Masaki Suenaga Symantec

Most viruses 'speak' in English, and English-'speaking' mass-mailing worms tend to spread worldwide. Virus analysts generally can also understand English. As a result, we can tell our customers about what kind of message is sent or what is targeted.

Even if the viruses speak in French or Portuguese, we might be able to extract the correct text from them. But there is quite a bit of room for error when extracting Portuguese words.

If the text is not written in the West-European code page, however, we have to guess which code page was used. If we fail, we will get nothing, and therefore cannot provide the same level of precise information to customers as we could if it were English text.

Encrypted English strings can be decrypted technically. Natural languages might look like hieroglyphics to those unfamiliar with the language. Machine translation is widely used nowadays and can be very useful when we know the correct strings and what language is used. The question is, how do we determine these? This paper will provide some tips.

 del.icio.us  digg this! digg this

Quick Links

Poll
The Japanese government is reported to have commissioned a 'defensive virus'. Is 'defensive' malware ever a good idea?
Yes
No
I don't know
Leave a comment
View 11 comments

99 Subscription Promo

Malware Prevalence
Autorun |#######|
Encrypted/Obfuscated |#####|
Heuristic/generic |#####|
Sality |####|
Zbot |####|
 View this month's full report

Virus Bulletin currently has 224,240 registered users.