Hotmail, Yahoo!, Gmail users hacked, but how?

2009-12-01

Terry Zink

Microsoft, USA
Editor: Helen Martin

Abstract

In October, thousands of usernames and passwords belonging to Hotmail users were posted publicly online, and anyone could have taken them, logged into the accounts and done something with them. Gmail and Yahoo! were also targeted. Terry Zink asks: how did the hacker(s) gain access to all of these accounts and usernames? Should we be afraid that someone will guess our passwords? Why did they do it? What did they do with it? And should we worry about it happening to us?


In October this year, thousands of usernames and passwords belonging to Hotmail users were posted on the technology website Neowin [1]. They were posted for everyone to see, and anyone could have taken them, logged into the accounts and done something with them. The accounts have since been reset. Computerworld reported on the story [2]:

If Neowin’s account is accurate, the Hotmail hack or phishing attack would be one of the largest suffered by a web-based email service.

Last year, a Tennessee college student was accused of breaking into former Alaska governor Sarah Palin’s Yahoo! Mail account in the run-up to the US presidential election. Palin, the Republican vice presidential nominee at the time, lost control of her personal account when someone identified only as “rubico” reset her password after guessing answers to several security questions.

Shortly after the Palin account hijack, Computerworld confirmed that the automated password-reset mechanisms used by Hotmail, Yahoo! Mail and Google’s Gmail could be abused by anyone who knew an account’s username and could answer a single security question.

The BBC reports that Gmail and Yahoo! were also targeted [3]. The situation is that some hacker obtained information that most people think is secret and then posted it publicly. A number of questions arise: how did the hacker gain access to all of these accounts and usernames? Is the Sarah Palin story relevant in this case? Should we be afraid that someone will guess our passwords? Why did they do it? What did they do with it? And should we worry about it happening to us?

How did it happen?

Depending on the reviews you read elsewhere on the web, there are a lot of theories about how this information could have been obtained. Let’s consider some of them.

The attacker hacked into Hotmail, Google and Yahoo! and stole the information

This particular mechanism involves the hacker breaking into Hotmail, Google and Yahoo!’s servers, stealing information, and then exiting before anyone had noticed. To do this, the hacker would need to exploit a known security weakness, or have known about an obscure security flaw that had not yet been patched, or have some sort of inside information that allowed them to bypass the security mechanisms with stolen credentials.

An external hack where someone breaks into Hotmail’s servers and accesses the account information is unlikely. It is much more likely that the attacker obtained the information through social engineering. Why is this more likely? For one, breaking into the servers would involve having to get past all of the firewalls and security measures that Microsoft/Hotmail has in place to keep intruders out. While not impossible, this would not be easy.

But secondly, even if an attacker were to break in and steal the account information, it is very unlikely that they could access the associated passwords. Passwords are not stored in clear text, they are encrypted using a one-way hash. All firms with good security store them this way.

One-way hashes are a basic security mechanism. They are based upon the idea that a function is easy to compute one way, but difficult to compute in the opposite direction. For example, the function f(x) = x2, i.e. the squaring function, is easy to compute: 22 is 4, 42 is 16, 5.32 is 28.09. However, it is much more difficult to calculate square roots. We know that the square root of 16 is 4, or that the square root of 64 is 8. We know this because we have our multiplication tables memorized, or we know certain square roots. But what is the square root of 79? That is not so easy. Of course, we have calculators that can determine this, but it is more computationally expensive to do so. Password encryption algorithms work the same way. It is possible to encrypt your password using an algorithm that encodes it, but reverse engineering it is computationally expensive. Of course, it is almost always possible to decode it if you know the algorithm, but this would be so time-consuming that by the time you had broken it the data would be stale.

If you ever forget your password for a website and click the link to recover it, there are two options:

  1. You are given a password reset where you click the link and type in a new password.

  2. You click the link and the password is sent to you in clear text.

For option 1, the reason you must reset your password is because even the folks who are storing it (e.g. Google, Yahoo! or Hotmail) do not know your password. It is stored in hashed text (i.e. a random string of characters that is created by the use of the encryption algorithm). They cannot give it to you because it is computationally infeasible. If a hacker were to break in and steal your password, all they would have is the hashed text – and entering your hashed text into the password field is the equivalent of entering the wrong password. The account would not authenticate.

Storing passwords in hashed text is standard practice in the industry, so even if a hacker broke in and stole information from Hotmail, Google and Yahoo!, there wouldn’t be much useful there to steal (in terms of passwords).

This suggests that the attacker tricked the user into handing over their user account and password through some other mechanism.

Hackers guessed the password

In 2008, Republican vice-presidential nominee Sarah Palin had her email account hacked. How? An attacker guessed her password.

Some websites have a set of security questions that will allow you access to your password if you answer them correctly. This may work if few people know you, but for a public figure like Palin a lot of personal information is publicly available. Answers to questions like ‘What is your father’s name?’ or ‘What year did you graduate from college?’ can easily be discovered using a quick Internet search. It wouldn’t take that much effort to guess a username and figure out the password.

However, in the Hotmail, Google and Yahoo! case, whilst I suspect that social engineering was used to obtain the information, I do not suspect security-question guessing. Note that while vice-presidential candidate Sarah Palin had her account hacked by somebody guessing her login information, this is not a scalable model for spammers. Palin is well known and you could possibly guess her information simply by reading about her online. But to access 10,000 accounts that way is too time consuming and the people being hacked are not well known. It would not be possible to guess their information, other than by chance.

The users fell for a social engineering scam

The general consensus is that these Hotmail, Google and Yahoo! users were victims of phishing scams. Such a scam would look something like this:

A Hotmail user receives a spam message in their inbox, which probably looks as if it has come from Windows Live. There is some call to action wherein the message says, for example, that Hotmail is upgrading its infrastructure and requires users to log into their account and verify their credentials.

In addition, there was probably some bot attack that broke Hotmail’s CAPTCHA service on the sign up page, so these spam messages would actually have been sent from Hotmail internally.

These types of spams can be more difficult to filter than those sent from another service. So we have Hotmail users spamming Hotmail users, possibly with a From: address like ‘Windows Live Mail Security <live.security.something@...>’. Some users did not recognize that this was a phishing scam, entered their credentials and the damage was done.

If the user entered their information by clicking the link and filling in their details, it would have been relayed back to the spammer who would then have the user’s credentials.

Spoofing scams like these are among the oldest spammer tactics. The most commonly associated mechanism is phishing where the spammer impersonates a bank, but spammers will also impersonate the IRS, the Better Business Bureau, CNN, and so forth. All of these are attempts to trick the user into taking an action, whether it is downloading and installing malware or giving up their username and password.

The spammers/hackers attacked some other weak site and stole information from there

Users falling prey to a phishing scam is one of the most likely explanations for this attack, but it is not the only possibility. The problem is that there are so many other possible attack vectors. Here’s one: spammers don’t have to target Hotmail users via a phishing scam. Notice that it was not only Hotmail users that surrendered their credentials, but also Yahoo! and Google users. A hacker would have a difficult time hacking Yahoo!, Google and Microsoft directly, but what if they attacked an online discussion forum or a blogging service?

Many websites across the Internet allow you to log into their websites using your email address as the username. How many people use their email address… and also use the same password across multiple sites? If a hacker were to break into an online forum – one with a low level of security – they could count on the fact that users tend to reuse usernames and passwords. Hackers get to take advantage of statistics – given enough people, some of them will be hits (i.e. using the same username/password combination).

Recall that the more mature services store passwords in hashed text. Since BBC News confirmed that the accounts were genuine and predominantly originated in Europe, I’m willing to bet that some discussion forum in Europe had its users’ usernames and passwords stored in clear text and was broken into, and the information stolen. The attacker then went and verified which ones unlocked the users’ accounts and discarded the rest. They then eventually posted them online for all to see.

Of course, even this may not necessarily be the whole story; it could have been easier than that. According to The Register, the most commonly occurring password was ‘123456’ and ‘123456789’ was the second most common [4]. These represent about 0.82% of the total passwords. So, if you acquired a large list of usernames and tried each of these two passwords, then there is a slightly less than 1% chance that one of the passwords will work. 1% is small, but it’s greater than 0%, and if you decided to automate it, you would have success in no time.

Techniques for breaking into a discussion forum’s backend are beyond the scope of this article, but it often involves exploiting weaknesses in the software such as cross site scripting (XSS) or SQL injection attacks. Microsoft has a software design process that requires coders and programmers to go through threat analysis and consider how those threats can be mitigated. However, the do-it-yourself hobbyist, while well-meaning, doesn’t always have the security background to be conscious of such attacks [5].

The users fell victim to a keystroke logger

There are possibilities other than a phish, hack or statistical hack. A user could have been the victim of a keystroke logger. For example, Win32/Koobface spreads by sending messages to a victim’s social network contacts with text such as ‘You should watch my latest video’, accompanied by a URL. When recipients visit the link, they are instructed that they need to download an update to their Adobe Flash Player plug-in in order to view the video. However, the download is actually the Koobface installer.

Koobface attempts to gain access to users’ sensitive financial information such as credit card numbers. It can also redirect access from search engines to malicious sites. While Koobface does not install a keystroke logger, other pieces of malware do. For example, Taterf is a family of worms that spreads through mapped drives in order to steal login information for popular online games (Taterf was the second most prevalent worm detected by Microsoft’s Malicious Software Removal Tool in the first half of 2009 [6]). Certain keystroke loggers can detect when a user visits Hotmail, Google or Yahoo!, and when they do, they log the keystrokes that a victim makes and send them back to the command-and-control centre. This gives the attacker access to the user’s information.

There are other ways to get infected, these include downloading music from disreputable sites, installing pirated software or visiting malicious web pages and becoming a victim of a drive-by download. In this instance, a piece of malware grabbing ‘only’ Hotmail passwords seems minor compared to stealing financial data. However, it would not be unusual for an attacker to gain access to data this way and a webmail password is a relatively innocuous piece of data to steal.

An increase in spam?

The attack vector is wide and it probably involved tricking the user into taking action and unwittingly giving up their credentials, rather than breaking into Hotmail and acquiring them that way. But now we shift our focus elsewhere – did we see an increase in spam from these compromised accounts?

Why would a spammer steal usernames and passwords from Hotmail, Yahoo! and Gmail only to give them up later? I can think of a few reasons:

  1. They stole the information to prove that they could do it in order to highlight the insecurity of the email space.

  2. They used the stolen information to set up accounts on Windows Live Spaces (a blog) or open SkyDrive accounts (to store spam images).

  3. They used the stolen information to send out volumes upon volumes of spam.

Option (1) is unlikely. People do not steal credentials these days to prove that they can, they do it for financial gain.

I cannot comment on (2) but I can comment on (3). Spam from services like Hotmail, Yahoo! and Gmail tends to be more difficult to filter because IP reputation filtering cannot be used without causing an unacceptably high level of false positives. I work for Microsoft Forefront Online, where amongst other tasks I collect various email statistics. For the last three months I have collected data on mail originating from IPs in these webmail services. I used the IPs in Hotmail’s SPF record, Gmail’s SPF record, and publicly available lists of Yahoo!’s IPs [7]. The chart below illustrates how much spam we receive from those three. I have normalized the values of the y-axis to hide the exact amount of spam that we receive from them.

The usernames and passwords were posted on 1 October 2009. Since that time, the amount of spam we received from all three services has declined somewhat. Instead, what we saw were huge increases on 3 September and 4 September followed by a rapid draw down – this was a month before the information was posted. Yahoo! spam increased throughout September but eventually declined right before the passwords were posted, whereas the other two services returned to normal levels straight after the outbreak. I checked AOL’s statistics and they also saw a huge spike on 3–4 September, but otherwise showed no significant deviation from their norm.

To me, this suggests the following:

  1. Since the information was made public, there has not been an increase in spam.

  2. It is difficult to say whether or not these accounts actually were used to spam; the only way to verify this would be to have the account names and go back through our logs, searching for them. I don’t have the account usernames. I also do not know if the posted usernames are the full dataset that was compromised.

  3. There was a huge spike on 3–4 September, which may correlate to these accounts. If I were to hazard a guess, I’d say that the spammer abused the accounts for these two days (only) and then abandoned them. He then posted them a month later to boast about what he did and to hint that he could do it again in the future.

Then again, there could be no relation at all and this could all be a coincidence. Isolated events are notoriously difficult to detect because there is so much variation within day-to-day events – that is, it can be difficult to separate the signal from the noise. Patterns that occur over time are easy to spot, but incidents like this are less so.

So, what do we know? We know that some users had their usernames and passwords stolen. We know that in early September, traffic from these services spiked. We know that a month later, the credentials were posted publicly and thus rendered useless. Whatever the motive was for stealing the accounts and then discarding them, email and Internet security still remain a serious issue to this day.

twitter.png
fb.png
linkedin.png
hackernews.png
reddit.png

 

Latest articles:

Nexus Android banking botnet – compromising C&C panels and dissecting mobile AppInjects

Aditya Sood & Rohit Bansal provide details of a security vulnerability in the Nexus Android botnet C&C panel that was exploited to compromise the C&C panel in order to gather threat intelligence, and present a model of mobile AppInjects.

Cryptojacking on the fly: TeamTNT using NVIDIA drivers to mine cryptocurrency

TeamTNT is known for attacking insecure and vulnerable Kubernetes deployments in order to infiltrate organizations’ dedicated environments and transform them into attack launchpads. In this article Aditya Sood presents a new module introduced by…

Collector-stealer: a Russian origin credential and information extractor

Collector-stealer, a piece of malware of Russian origin, is heavily used on the Internet to exfiltrate sensitive data from end-user systems and store it in its C&C panels. In this article, researchers Aditya K Sood and Rohit Chaturvedi present a 360…

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…


Bulletin Archive

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.