VBSpam testing methodology - ham corpus

It is impossible to test a spam filter's capability of blocking spam without measuring at the same time what percentage of legitimate email is blocked. To this end, a good test requires a ham corpus of a decent size and good, representative quality.

In previous tests, the legitimate email sent to @virusbtn.com addresses was used as a ham corpus. However, while this corpus is very real, it has the downside that the emails in it can not be shared with participants (for reasons of privacy, among others). Because being able to verify false positives and use them to improve filters is an important part of the test, some changes have been made that enable us to use a ham corpus that we can share in full.

The main part of the ham corpus consists of emails sent to public mailing lists. These emails are sent by the sender to a list server which makes some minor modifications to the subject and the contents, adds several headers and relays the email to each subscriber. In our test, we remove the headers added by the list server and make it appear as if the email was sent directly to the virusbtn.com domain; this includes using the original values for HELO/EHLO, the sender's IP address and its reverse DNS in the Received header and MAIL FROM in the SMTP envelope. Tests have confirmed that this gives a good ham corpus that products can filter without any major problems.

It should be noted that the subject and the body are not changed, nor are the pre-existing headers of the email. This gives some possibilities for 'cheating', e.g. by whitelisting emails that contain certain strings in the subject. This is explicitly not allowed; checks will be made to make sure no product is doing this.

It is possible for the ham corpus to contain some mailing lists that have not been modified and whose sender is the list server. A small number of legitimate newsletters will also be added to the ham corpus.

Quick Links

Poll
The Japanese government is reported to have commissioned a 'defensive virus'. Is 'defensive' malware ever a good idea?
Yes
No
I don't know
Leave a comment
View 11 comments

99 Subscription Promo

VB2012
VB2012 VB2012 will take place 26 - 28 September 2012 at the Fairmont Dallas hotel, Dallas, TX, USA.

Virus Bulletin currently has 224,238 registered users.