Bayesian filter
Filtering of email using probabilities of the occurrence of individual words in ham and spam
A Bayesian filter is a spam filter that uses of the probabilities of the
occurrence of individual words in ham and spam emails, and then computes a
probability that the email containing those words is spam.
The likeliness of specific words occurring in ham and spam differs a lot between users: for
instance, medical terminology and stock exchange-related words are commonly seen in spam emails - however,
employees in the pharmaceutical or financial industries respectively are likely to see these words
in legitimate email. Bayesian filters therefore work best if they are 'taught' by the user as to what is
considered spam and ham.
Spammers are known to use various technologies in their attempts to surpass spam filters,
such as adding random bits of text or hiding the content of the email in an image. An overview of such
technologies can be found in the The Spammers' Compendium on this website.
Related web links