NEWS FROM THE LAB - Friday, January 16, 2009

Calculating the Size of the Downadup Outbreak Posted by Toni @ 13:59 GMT

The number of Downadup infections are skyrocketing based on our calculations. From an estimated 2.4 million infected machines to over 8.9 million during the last four days. That's just amazing.

We've received a number of queries on just how exactly we're producing our estimates.

There's been interest from Internet operators, CERTs, and fellow antivirus researchers.

There's also been several posts to our blog comments, doubting our numbers. Here's some sample quotes:
   Kitschen: Yeah right! As if you could "estimate" infection
   to a precision of 10 machines. This is just PR.
   Your "special techniques" are at best able to estimate 100000.

   wastedimage: This number looks like total guesswork.
   How did you go from ~100k ip's to 2.4 million boxes?
   I realize *some* might be nat but how many ?
   Did you just assume each ip really was some arbitrary number
   of vulnerable machines or something?
   Spreading FUD like this is incredibly unprofessional.

   wastedimage: So your trusting the counter built into the bot itself
   which may be rigged to indicate larger numbers to entice spammers
   to pay more for its use. Sure that sounds like a solid plan.

So let us explain how we are generating the numbers.

There are several different variants of Downadup out there. The algorithm to create the domain names vary a bit between the variants. We've been tracking the variant we believe to be most common. It creates 250 possible domains each day. We've registered some selected domains out of this pool and are monitoring the connections being made to them.

This is what the connections look like:

Downaup logs

As you can see, this is a standard httpd log showing the IP address of the machines connecting our domains, the time stamp (the queries in the above image all come on the same second: 18:16:05 yesterday), actual query ("GET /search?q=29 HTTP/1.0"), and the User-Agent of the machine.

These are the raw connections coming to our sinkhole systems. Millions of them every day. When we sort these connections by source, we see hundreds of thousands of unique IP addresses every day (over 350,000 today).

It's hard to tell the real number of infections since NAT boxes and proxies tend to spoil the fun and Downadup doesn't include a unique identifier within the User-Agent string for us to see.

We first tried to count unique User-Agent headers per IP address, but the results weren't very good as in a standardized corporate network, most machines have identical User-Agents.

So, with a little digging we discovered that in the /search/q=NUMBER query, the number is not random. It's basically a global variable in the code, getting incremented (thread-safely through InterlockedIncrement) every time the malware has successfully exploited a machine via MS08-067. The incrementation is done in the httpd thread of the malware, after it has exploited a machine successfully.

So this number tells us how many other computers this machine has exploited since it was last restarted. In the above log you can see one of the machines has exploited 116 computers.

Do bear in mind that this number only shows how many machines got infected via the MS08-067 exploit. Downadup spreads at least as much via network shares and USB sticks.

We wrote a program that parses the logs, extracting the highest "q" value for the IP/User-Agent pairs. These are then added together to get our figures. As you can see now, they are very conservative.

And they are showing more than 8 million infected machines right now.

The situation with Downadup is not getting better. It's getting worse.