The CURSE of anti-spam testing

Martijn Grooten Virus Bulletin

Although email spam has been pestering end-users for more than a decade and anti-spam solutions have been helping them keep their inboxes tidy for almost as long, relatively few attempts have been made to test such solutions. Perhaps this is not too surprising: as an expert in the field once said, anti-spam testing is 'fiendishly difficult', which is not in the least because of the filters' tendency to block large amounts of spam even before the tester can have a look at it. Still, we believe that it is possible to run such tests and this paper deals with the running of anti-spam tests.

The paper will consist of two parts. The first part will deal with the general concept of anti-spam testing and some guidelines will be discussed on what a good anti-spam test should be like. A representative and reliable anti-spam test should fulfil five conditions:

Comparative: results of an anti-spam test should always be seen within the context of a test; hence the most meaningful results will be where various solutions are being compared using the same, or very similar circumstances and corpuses.
Unbiased: the test should not bias any filtering method, nor should end-users, when classifying email, have any knowledge of products' decisions.
Real email in real time: the corpuses used in the test should consist of real ham and real spam email and should be sent in real time.
Statistically valid: the corpus should contain enough ham and spam to make claims about the products' performance within a reasonable error margin.
Explaining what is done: the testing setup, including but not limited to the way the corpuses are obtained and the way a golden standard is set, should be well explained.

The second part of the paper will discuss the comparative anti-spam test we have set up - in particular the decisions we have made based on these conditions. We will also discuss our own experience with running such tests.