Unattended spam filtering using machine learning: implementation, deployment and lessons

John Graham-Cumming

polymail is a commercial anti-spam library in wide use worldwide. The library supports both attended (i.e. user gives feedback on spam filtering mistakes - sometimes known as 'train on error') and unattended (i.e. blackbox with no user interaction necessary) filtering. This talk details the implementation of 'train on everything' spam filtering with SURBL integration and automatic whitelisting, the deployment of an unattended spam filter and lessons learned in over a year of deployment supporting 10,000s of clients.

Although polymail is a commercial product, this is a technical talk. Come prepared for some mathematics, lots of code, plenty of real-world data and no wild marketing claims of 99.999% accuracy!

 del.icio.us  digg this! digg this

Quick Links

Poll
The Japanese government is reported to have commissioned a 'defensive virus'. Is 'defensive' malware ever a good idea?
Yes
No
I don't know
Leave a comment
View 11 comments

99 Subscription Promo

VB100 certification
VB100 This month's VB100 test saw some major changes and a radical overhaul of the VB100 test methodology - for the first time allowing products to use their 'cloud' look-up systems. John Hawes has all the details.
See full results.

Virus Bulletin currently has 224,243 registered users.