Detecting spam pictures using statistical features

Sándor Antal VirusBuster

  download slides (PDF)

The problem we want to solve is to detect spam messages which contain essential information in an attached picture.

Unfortunately, nowadays spammers usually vary the pictures randomly (e.g. include little dots or lines), which is why images of two instances of the same spam differ. The aim of the spammers who do this is to avoid their spam pictures being detected by hash-based methods. Our goal was to eliminate the problems caused by this trick and develop a fast method which is not as sensitive to the little differences in pictures as the hash-based methods are.

The methods we have developed and use are to calculate statistical parameters of the image file (size, average, STD etc.) without rendering the image to smooth the image using differnet IF methods (for example Gaussian Blur or various types of granulation filters) to remove several disturbances (e.g. random dots) to calculate global parameters of an image (e.g. brightness, contrast) to use these parameters in a hash function which gets similar hash values for similar pictures. It means that if there is a little difference between the hash values of two pictures then they are the same or almost the same considering these parameters as spam/ham features and using the Bayesian method. This means that it is enough to teach only a few (maybe only one) spam instance and (unless the pictures are varied significantly) the filter can detect the modified variations as well.


Poll

Do you use the same password(s) across multiple websites?
I use the same password for all sites
I have a number of passwords but use the same for some sites
I use a different password for each site
I don't sign up to any sites that require a password

Leave a comment
View 4 comments

Jobs Recruit Sidebar

VB100 certification

VB100 This month VB's test team put 26 products to the test on Windows Server 2008. John Hawes has the full results.
See full results.

Virus Bulletin currently has 190,584 registered users.