Unattended spam filtering using machine learning: implementation, deployment and lessons

John Graham-Cumming

  Technical stream: Thursday 12 October 2006, 14:40 - 15:20.

polymail is a commercial anti-spam library in wide use worldwide. The library supports both attended (i.e. user gives feedback on spam filtering mistakes - sometimes known as 'train on error') and unattended (i.e. blackbox with no user interaction necessary) filtering. This talk details the implementation of 'train on everything' spam filtering with SURBL integration and automatic whitelisting, the deployment of an unattended spam filter and lessons learned in over a year of deployment supporting 10,000s of clients.

Although polymail is a commercial product, this is a technical talk. Come prepared for some mathematics, lots of code, plenty of real-world data and no wild marketing claims of 99.999% accuracy!


Poll

Should AV software check search engine results for malicious sites even before the user clicks on them?
Yes
No
I don't know

Leave a comment
View 8 comments

Jobs Career Sidebar

Virus Bulletin

In this month's magazine:
  • A commitment to quality and reliability
  • The road less truvelled: W32/Truvel
  • New memory persistence threats
  • Reversing Python modules
  • Advertising database poisoning
  • Sunbelt Software VIPRE Antivirus + Antispyware
  • Spear phishing – on the rise?
Virus Bulletin 07 2008
Subscribe now!
Virus Bulletin currently has 129,051 registered users.