Mapping the email universe

Paul Judge CipherTrust
Dmitri Alperovitch CipherTrust

Three years ago, the spam battle focused solely on detecting the spam messages present within an email flow. That approach was palatable when spam only accounted for 10 to 20 percent of Internet mail. However, given the current reality that often 80 to 90 percent of all inbound email is spam, a more logical approach today is to focus first on ensuring the deliverability of the wanted email, and then discarding the unwanted messages. In this talk, we present the results of 14 months of research data, mapping all email senders to some point on the continuum from known bad to unknown to known good.

We first provide an overview of our approach to automated sender classification based on message traffic patterns. Starting from the known good side of the spectrum, we describe the traffic patterns that are used to identify good mail senders and present a view of that corresponding component of the email grid. Existing work in whitelisting historically has focused on manual list additions. Therefore, to obtain complete coverage we developed a classification technique based on persistence, delivery breadth and outbound flow analysis. Our results show that only 3.5% of an organization’s connections are regular communication partners. Further, we analysed the effect of email authentication protocols such as SPF and SenderID and determined that, contrary to popular wisdom, they are not extremely useful in identifying legitimate senders because, for example, six times more spam passes SPF than ham.

On the other end of the continuum are persistent spam senders. To identify these egregious offenders, we designed a reputation system that compiles historical data from these senders to identify the known bad senders. Left somewhere in the middle of the spectrum are senders that in the past would not have enough information available about them to make a decision, typically referred to as ‘gray’ mail senders. Our results show that historical information is not available for these senders because rather than sending large quantities of mail to as broad audience as possible, spammers tend to spread out their target lists across their entire network of zombies, enabling much of the network to fly under the radar and avoid detection. Our results show that over 70% of unwanted email originates from zombies. We will discuss the lifecycle of a zombie spam attack, and show how we are able to identify the creation of a new zombie network and fingerprint the botnets responsible for certain spam and phishing attacks. We will present demonstrations showing the real-time use of IRC channels to scan and exploit vulnerable machines and reveal statistics of the rate at which botnets are enlarged, used in an attack and disposed.