Thursday, September 09, 2010

Random Pseudo-URLs Try to Confuse Anti-Spam Solutions

For the past couple weeks things have NOT been normal for the Spam & Phishing folks at the UAB Computer Forensics Research Laboratory. The Phishing Operations team has been inundated by URLs being reported to them as "potential phish" that are not only not phish, they are not even URLs!

Here's a handful of the recently received URLs (in the past 5 minutes or so):

http://germany.aksumite.com/Union/measures.php?Leydesdorff=a1A55DF6e8
http://added.the.com/UK/as.php?Association=535BEdCfbbF
http://she.on.com/the/probably.php?export=FbB4876098a
http://marine.sara.com/capita/children.php?Playing=59F751345112
http://limited.customers.com/capita/are.php?some=11aB6193fa
http://national.office.com/capita/strictly.php?During=8683E94310aA
http://the.of.com/went/of.php?film=9EA1b1020C
http://contain.of.com/fault/produced.php?main=Aed30182b225
http://not.barcelona.com/in/o.php?also=D13089900a
http://early.due.com/the/Aquilii.php?Henry=2BD9Fb6684da
http://climate.is.com/In/North.php?led=F707a1de3B06
http://a.amber.com/restricted/as.php?and=75851d442EA

Without a "pattern", its hard to "mass-whack" the URLs, and so they keep ending up in our "Phishing-URLs-to-be-checked" list. The problem is that MOST of those domains actually exist!

amber.com is owned by "Future media architects"
is.com offers free domain names (so we see them sometimes on phish normally!)
due.com is a research & development incubator
barcelona.com is a tourism site for the city in Spain.
of.com claims to be the Online Finance Company.
office.com is a redirector to office.microsoft.com
customers.com is the Patricia Seybold Group
sara.com is the Scientific Applications & Research Association
on.com is a webcam chatting service
the.com is a parked domain on FirstLook.
only "aksumite.com" is not "live" somewhere.

The UAB Spam Data Mine has been seeing similar things. We're accustomed to spammers creating a "wildcard" DNS entry for a host, and then they can make up any random hostname they want and use 1,000 different machine names to refer to one Viagra sales website. We actually deal with that quite effectively, because once we have seen five hostnames for the same domain, we create a random domain name ourselves. If the contents of a random hostname for a domain gives us the same results as a spammed hostname for a domain, we mark all of them as being related and stop checking the rest. When "normal" spammers are using many hostnames, we only see a few domains using this technique per 15 minute work period, so even though, for instance, on August 8th we saw 2,670,602 hostnames in spam, (counting repeated hostnames), it wasn't such a big deal.

The problem with this new spam is that rather than having one destination per 15 minute work period that has randomization for the domain, we may have thousands in a single 15 minute work period.

On September 5th we saw 450,976 UNIQUE hostnames advertised in spam! To put that in perspective, from August 1 until August 28 the highest single day unique hostname count we had was 38,452. On August 29th, we had 391,594 unique hostnames advertised in spam! A tenfold increase in a single day! And its stayed there. We've had more than 370,000 every single day in September.

Or did we? My anti-spam friends RedDwarf and SiL were discussing this recently, over on the "InBoxRevenge" forums, and they mentioned that another lab had seen a dramatic jump in unique URLs beginning about August 26th. Its hard for me to see the same jump in unique URLs, because we see millions of URLs per day, and the number hasn't changed so dramatically -- but when we look at unique hostnames instead, we do see an enormous jump!



This corresponds to the second problem we observed in the lab. In our multi-phase spam parsing, phase two is "resolve the domains to IP addresses and store that data in a database." We started experiencing a backlog in that phase that was brought to my attention on September 1st. We hadn't put the two pieces together until last night when someone called attention to the RedDwarf posts on this topic.

I ran another query to count how many times we have seen each unique DOMAIN name -- not HOST name -- and the tail goes "to infinity and beyond" on this chart!

In the first seven days in September, we saw 149,964 unique DOMAIN names used in spam!!!

I tried to chart the distribution of domain names, but the chart ends up looking like I've shown you an empty chart because the tail is SO long and the drop-off is so dramatic. I'll try it as a table instead:
30 domains25,000+ times
132 domains10,000 - 24,999 times
1,051 domains1,000 - 9,999 times
5,818 domains100 - 999 times
21,417 domains10 - 99 times
13,907 domains5 - 9 times
39,580 domains2 - 4 times
68,030 domains1 time


An analysis of how often these "pseudo" domain names appear helps us to understand that the selection process for these host names is NOT a random selection from a dictionary, but rather a random selection from a large text sample. We know this by the frequency of commonly occurring words. During that same period, here is the count by domain name for the spam:

179,958 - the.com - #1
100,255 - of.com - #2
74,603 - and.com - #4
66,104 - in.com - #6
47,307 - a.com - #5
42,713 - to.com - #3
28,217 - is.com - #7
20,051 - for.com - #13
18,234 - as.com - #17
18,097 - by.com - #29
17,512 - was.com - #12
16,178 - on.com - #14
14,962 - with.com - #16
13,990 - from.com - #26
13,879 - are.com - #15

The number following the domain name is that word's frequency from "The Most Common Words in English." The fact that they don't follow the true frequency count probably points to the fact that while they have a large language sample, its not a truly enormous language sample, or we would see a true-er frequency distribution.

The first possible "double-usage" domain comes here:

13,526 - apple.com

Apple is not one of the 500 most common words in English.


Clearly most of these are NOT going to be "Pseudo-URLs", as we know that "apple" is not nearly as common a word as "are" and "from". In fact, most of the emails we have with apple.com in them are unlikely to be spam at all. Other domains that we saw with at least this high a count are either "whitelisted" domains or they are clearly "spam" domains. (List below has "whitelisted" domains supressed).

58,039 - x-misc.com
53,504 - greatillsmeds.com
41,012 - larnebaptist.com
39,629 - liede.ch (???)
33,724 - modestbusy.ru
31,613 - factwent.ru
31,579 - rs6.net
29,704 - cornermust.ru
28,802 - citznet.com (???)
27,752 - wishroot.ru
27,572 - giftedcorner.ru
27,184 - frontbody.ru
25,928 - twosheet.ru
25,583 - tweakdsl.nl
24,164 - fellapple.ru
23,800 - theycame.ru
23,300 - lengthwant.ru
23,152 - rosynext.ru
23,004 - waveand.ru
22,808 - alwayswhich.ru
22,800 - loftyneck.ru
22,784 - feelat.ru
22,100 - outfollow.ru
21,696 - ifdrool.ru
21,648 - colonyread.ru
21,144 - nowgrass.ru
20,892 - rowletter.ru
20,880 - tech-logol.ru
20,530 - guessflower.ru
20,460 - bymaxi.ru
20,416 - twentyshare.ru
20,400 - yulepitch.ru
20,360 - brownsingle.ru
20,236 - drawbetter.ru

When we get down to the single use domains, it becomes clear again that the "word list" for these randomly created domains is not a dictionary. We have words like "bariloche" "doughton", "vignarajah", and "okjeo", which does seem to lend credence to the idea that has been floated that these are words selected from Wikipedia.


Example One: Pharmacy Express


But what does the spam actually LOOK like? and what does it do?

(Click here to see the original email)

Here is an example image from the spam:



In the spam message that used this image, the image was loaded from the URL:

belo.tweakdsl.nl/novice76.jpg

and clicking on the spammed image would take the visitor to:

belo.tweakdsl.nl/novice76.html

which contained an auto-forwarder that would have sent the visitor instead to:

www.optomemed.com

Which is a Pharmacy Express pill sales site hosted on 109.196.130.100:




Please note that the URL on "tweakdsl.nl" is a compromised domain, as we discussed in our August blog article Viagra Spammers as Hackers, where compromised domains were used as spam targets and redirected the visitors to a Pharmacy Express domain.


The NOISE in that spam message however, includes links to non-existent images including:

http://General.IT.com/but/Internet.jpg
http://of.was.com/increased/to.jpg
http://two.who.com/include/with.jpg
http://nine.introduced.com/Massachusetts/It.jpg
http://of.Michael.com/many/Thar.jpg

Then there is a block of text, hidden from the email recipient by a "span style" tag that reads:

The variation among the German dialects is considerable, with only the neighboring dialects being mutually intelligible. NYS School of Industrial and Labor Relations. The then-reigning government (cabinet Persson) stated that they would only take into consideration the results of the referendum in Stockholm Municipality. They too have been deaf to the voice of justice and of consanguinity. The country accounts for two-fifths of global military spending and is a leading economic, political, and cultural force in the world. A National Public Radio affiliate, and Public Broadcasting Service television station WPBA 30. Stadiums with a capacity of more than 40,000. New York City at the Open Directory Project. The other professional rugby union team in the city is second division club London Welsh, that plays home matches in the city. A sense of Indonesian nationhood exists alongside strong regional identities.

Mixed in among that text are additional non-existent image tags:
http://to.Bight.com/directed/once.jpg
http://letter.Desert.com/States/objectives.jpg
http://so.initial.com/tress/Southeast.jpg

Example Two: Canadian Pharmacy



(Click here to see the original email)

The group above actually hasn't been so troublesome, because we don't bother to resolve every .jpg URL that comes through our spam. The explosion actually has come from the group described in THIS example. In this email there are a large mix of ".php" URLs mixed into the hidden data.

The image displayed in the spam has a randomly created name itself, in this case anchored to the "real" domain "waveand.ru". In this example the graphic file was retrieved as:

http://played.waveand.ru/Australian.gif

but, just as a test, I told my browser to load instead "heresatest.waveand.ru/sillypicture.gif", a name that I just made up. I get the same image either way. In fact, any machine name with any file name that ends in ".gif" will show you the same graphic if the domain name is "waveand.ru".



The same is true for the URL that you are directed to if you click on the graphic in your spam. Going to "waveand.ru" with any machine name and NO filename will cause the autoforwarder to send you to:

www.wholesalemedicalbook.com

Which looked like this when we visited:



There are currently thirty "real" spammer domains, each of which function in the same way as the "waveand.ru" domain:

allfeet.ru
armlate.ru
busytree.ru
comeforest.ru
cornermust.ru
drawbetter.ru
eastlegacy.ru
fatherany.ru
fellapple.ru
frontbody.ru
fruitproud.ru
gatherfast.ru
hardyyear.ru
knowdid.ru
leddrive.ru
legendwhy.ru
loftyneck.ru
magnetfar.ru
middlepalate.ru
nowgrass.ru
quickdo.ru
rivernote.ru
rowletter.ru
solvehealth.ru
studysound.ru
tenpoem.ru
theycame.ru
typeletter.ru
waveand.ru
zappair.ru


The Pseudo URLs in this email included:

http://FW.Health.com/the/sometimes.php?pitch=4F2db0aA031
http://n.missing.com/served/The.gif
http://There.Mickey.com/data/wooden.php?Seattle=2c8576DC866
http://summer.and.com/The/the.gif
http://Roger.Los.com/islands/violin.php?altitude=fb75d25708a
http://R.Many.com/Includes/what.gif
http://after.December.com/nodes/the.php?his=22CAfE6c8966
http://from.the.com/also/in.gif
http://the.over.com/a/domes.php?of=aE1E51938890
http://society.first.com/was/guitar.gif
http://has.of.com/the/Jovan.gif
http://and.South.com/quest/showing.gif
http://Html.must.com/social/December.gif

And the same "style span" trick was used to supress text intended to confuse spam filtering systems, which in this example read:

Expansion of transportation options encouraged economic expansion. Kahn then founded LightSurf in 1998. Temperate grasslands, savannas, and shrublands. Subtracted from 10, that leaves a result from 1 to 10. After the Cold War, the 86th was realigned to become an Airlift Wing, which it remains today. Just under three quarters of Australia lies within a desert or semi-arid zone. Since this ion is three steps removed from atmospheric CO 2, the level of inorganic carbon storage in the ocean does not have a proportion of unity to the atmospheric partial pressure of CO 2. Japanese Journal of Religious Studies 33. The law went into effect in March 16, 2006, garnering much local and national media attention. Wars involving the Illinois Country, Illinois Territory, and State of Illinois. NGS FAQ - What is a geodetic datum. Archived from the original on 6 July 2010. Australia is also powerful in track cycling, rowing, and swimming, having consistently been in the top-five medal-winners at Olympic or World Championship level since 2000. The SI units for both systems are summarized in the following tables. Time has seen significant improvements in the usability and effectiveness of computer science technology. The ISBN separates its parts (group, publisher, title and check digit) with either a hyphen or a space.

No comments:

Post a Comment

Trying a new setting. After turning on comments, I got about 20-30 comments per day that were all link spam. Sorry to require login, but the spam was too much.