(update: report link URL at end has been corrected...)
In the UAB Computer Forensics program, we have students who are studying the basics of cybercrime, but we also have students who are Malware Researchers, Phishing Researchers, and of course Spam Researchers. Much of our research is enabled by our main research project, the UAB Spam Data Mine.
For some of you, the first glimpse you had of the power of the UAB Spam Data Mine was in last Friday's entry, Linking All the News Spam Together. In that example, we made a recursive SQL query, where we asked "what other spam was sent by the computers that sent this spam?" That's one of the most basic queries we can do. Give us a spam "subject", and we can find all the IPs that sent that subject, and then all of the other emails that those same IPs sent. We can do much cooler things than that. I'll try to tell you about one each week.
In today's post we'll demonstrate one of the ways the Malware team can benefit from queries from the UAB Spam Data Mine.
On a mailing list (Gar waves to Paul), someone mentioned some new malware that was advertised by a spam message with the subject "Shocking porno dvd Carmen Electra". The URL in that spam message pointed to a website ending in "index1.php".
When you visit the website, it downloaded a virus with the name: video312f3sxxx.exe
We knew that we had seen lots of spam with "index1.php", but wondered how many different versions of the virus we could still find "live" on the Internet.
This wasn't intended to be an exhaustive search, so we didn't worry about trying to prove we had every single email in this cluster -- although the spam clustering algorithms are advancing to the point that that is very possible. For today we just did another very simple query.
"Let's find all of the spam where the subject had the word "Shocking" and that contained a URL ending in "index1.php"
select a.message_id, a.subject, a.sender_ip, b.machine, b.path
from spam a, spam_link b
where (a.message_id = b.message_id)
and subject like '%hocking%' and path like '%/index1.php%';
This resulted in more than 1600 emails. Changing the query up, we added "group by machine" to make a list of the 261 unique websites which had been advertised as hosting an "index1.php" file.
That list got passed off to a simple "wget" script, which fetched the content of index1.php, following any links that it sent us to, as long as they were on the same site.
Of the 261 websites which had been advertised by this spam, 71 of them were still "live", and gave us 578 different files. In most cases, here's how the fetch worked:
Pulling "index1.php" would send us to a webpage, often named either "index6.html" or "index12.html". That web page would have an Animated GIF file, which, if clicked on, would download the actual virus, as an ".exe" file.
Here are the five animated .gifs which were found on the different versions of the websites:
The next step was to find out how many virus "versions" we were dealing with, and whether they were well known or not. We ended up with 40 different MD5 values, and 40 different filenames:
In this PDF table of the websites we reviewed, 08aug08.report.pdf, we list all 74 live websites from which we received malware in today's check of the 261 sites. The filename, MD5, size, and the date of the exe files is given.
The malware sites are everywhere . . . Aregntina, Brazil, Canada, the Czech Republic, Denmark, France, Germany, Hong Kong, Italy, Mexico, Poland, Portugal, Romania, Spain, Switzerland, Turkey, Venezuela. (See IP WHOIS spreadsheet, or Domain WHOIS list.)
The malware from these sites will now be "unpacked" and analyzed by the malware researchers. They've already looked at many of these pieces of malware. For example, the "news_usama_video.exe" that they looked at last week had several nice clues in it, such as the IP address of the Command & Control site for the malware, the format of the communications to and from that C&C, and an internal version number. The malware we were looking at two weeks ago in this family labeled itself "1.0.4" internally. The version last week called itself "1.0.5". Several of the earlier versions were all proven to be related by the fact that they all pointed to the same Command & Control even though their MD5 value was different.
Spam => Data Mine => Reports => WGets => Unpacking => Analysis