Detecting spam pictures using statistical features

Sándor Antal VirusBuster

  Technical stream: Thursday 20 September 2007, 12:00 - 12:40.

  download slides (PDF)

The problem we want to solve is to detect spam messages which contain essential information in an attached picture.

Unfortunately, nowadays spammers usually vary the pictures randomly (e.g. include little dots or lines), which is why images of two instances of the same spam differ. The aim of the spammers who do this is to avoid their spam pictures being detected by hash-based methods. Our goal was to eliminate the problems caused by this trick and develop a fast method which is not as sensitive to the little differences in pictures as the hash-based methods are.

The methods we have developed and use are to calculate statistical parameters of the image file (size, average, STD etc.) without rendering the image to smooth the image using differnet IF methods (for example Gaussian Blur or various types of granulation filters) to remove several disturbances (e.g. random dots) to calculate global parameters of an image (e.g. brightness, contrast) to use these parameters in a hash function which gets similar hash values for similar pictures. It means that if there is a little difference between the hash values of two pictures then they are the same or almost the same considering these parameters as spam/ham features and using the Bayesian method. This means that it is enough to teach only a few (maybe only one) spam instance and (unless the pictures are varied significantly) the filter can detect the modified variations as well.


Poll

Have you ever been conned by a phishing email?
I have never seen/recognised a phishing email
I always ignore or delete phishing emails
I have responded but realised in time to prevent any damage
I have lost money/accounts have been compromised

Leave a comment
View 12 comments

Jobs Career Sidebar

Jobs

In Virus Bulletin's jobs pages among others:
Virus Bulletin currently has 137,607 registered users.