Monday, November 18, 2019

Facebook's Transparency Report: (Expert) Supervised Machine Learning Works!

Last summer the BBC technology program "Click" came to visit the lab for a special called "Can Technology Solve the Opioid Crisis?"  One of the points we stressed with @NickKwek was that when we report opiods and fentanyl-related posts to Facebook the objective is not to take down THAT POST, but rather to help Facebook's automated tools update their models of what offensive drug sales content looks like.

Last week we had an opportunity to see what that looks like in action as Facebook released their transparency report for Q3 2019.  Facebook's Transparency report is divided into two major sections which each have two subsections. "Enforcement of our Standards" covers "Community Standards Enforcement" and "Intellectual Property Infringement."  The other major section, "Legal Requests" is divided into "Government Requests for User Data" and "Content Restrictions Based on Local Law."

The November 2019 transparency report for Community Standards looks at ten categories of content on Facebook and four categories of content on Instagram.

In this post, we'll look primarily at the statistics for "Regulated Goods: Drugs and Firearms" but the other categories on Facebook are:

  • Adult Nudity and Sexual Activity
  • Bullying and Harassment
  • Child Nudity and Sexual Exploitation of Children
  • Fake Accounts
  • Hate Speech
  • Spam
  • Terrorist Propaganda
  • Violent and Graphic Content
  • Suicide and Self-injury
On Instagram, the other categories are:
  • Child Nudity and Sexual Exploitation of Children
  • Suicide and Self-injury
  • Terrorist Propaganda
Facebook has shared previously about our work to reduce terrorist content on their platform.  See their "Hard Questions" blog post -- "Are We Winning the War on Terrorism Online."  In this most recent report, they share that "Our proactive rate for detecting content related to al-Qaeda, ISIS and their affiliates remained above 99% in Q2 and Q3 2019, while our proactive rate for all terrorist organizations in Q2 and Q3 2019 is above 98%."

What does that mean?  It means that through the power of machine learning, when someone posts content trying to "express support or praise for groups, leaders, or individuals involved in terrorist activities" the content is removed automagically without the need for anyone to report it 98-99% of the time!

They've also previously discussed our relationship regarding the Opioid Crisis.  See their post "Supporting Our Community in the Face of the Opioid Epidemic." 

As Facebook has focused on identifying drug-related content, the number of detections has risen.  That's likely from two reasons -- one, they are now discovering content that previously would have remained unreported in the past; but also two, frustrated users are attempting to post their drug sales information in more ways trying to get past the blocks -- and largely failing to do so.

Drug related posts actioned:
  • 572,400 posts in Q4 2018
  • 841,200 posts in Q1 2019 
  • 2,600,000 posts in Q2 2019 
  • 4,400,000 posts in Q3 2019
When I attended Facebook's Faculty Summit all the way back in 2016  they had me hooked from the very beginning of the day when Facebook's Engineering Director Joaquin Quinonero Candela gave his opening keynote.  All of this amazing machine learning technology that people like Dr. Candela had created to help improve online ad delivery were ALSO being used to make the platform as safe as possible against a wide variety of threats. I was especially excited to learn about the work of Wendy Mu. At the time Wendy's bio said "Wendy is an engineering manager on Care Machine Learning, which leverages machine learning to remove abusive content from the site.  Over the last three years at Facebook, she has also worked on Site Integrity, Product Infrastructure, and Privacy."  Wendy and her team are inventing and patenting new ways of applying machine learning to this problem space.  Nektarios Leontiadis "a research scientist on the Threats Infrastructure Team" with a PhD in online crime modeling and prevention from Carnegie Mellon and Jen Weedon, previously at FireEye, were some of the other folks I met there that made such a profound impression on me!  Since then, the UAB Computer Forensics Research Lab has partnered with Facebook on many projects, but quite a few have taken the form of "what would a human expert label as offending content in this threat space?"

This is where "supervised machine learning" comes into play.  

The simplest version of Supervised Machine Learning is the "I am not a Robot" testing that Google uses to label the world.  You may be old enough to remember when Google perfected their Google Books project by asking us to human label all of the unreadable words that their scanner lifted from old books, but which were not properly recognized by their OCR algorithm.  Then we were asked to label the address numbers found on buildings and mailboxes and then later to choose cars, bicycles, traffic lights, and more recently cross walks as it seems we are not teaching future self-driving cars how to not drive over pedestrians.

This works well for "general knowledge" types of supervised learning.  Anyone over the age of three can fairly reliably tell the difference between a Cat and a Dog.  When people talk about supervised machine learning, that is the most common example, which comes from the concept of "Convolutional Neural Networks".  Do a search on "machine learning cat dog" and you'll find ten thousand example articles, such as this image from Booz Allen Hamilton.

Booz Allen Hamilton infographic 

We're working on something slightly different, in that the labeling requires more specialized knowledge than "Cat vs. not Cat".   Is this chemical formula a Fentanyl variant?  Is the person in this picture the leader of a terrorist organization?  What hashtags are opioid sellers using to communicate with one another once their 100 favorite search terms are being blocked by Facebook and Instagram?

Facebook Research has a nice set of videos that explain some of the basics of Machine Learning that are shared as part of the "Machine Learning Academy" series:

In this chart, the data provided by UAB is primarily part of that "Data Gathering" section ... by bringing forensic drug chemists into the lab, we're able to provide a more sophisticated set of "labelers" than the general public.  Part of our "Accuracy testing" then comes in on the other end.  After the model built from our data (and the data from other reporters) is put into play, does it become more difficult for our experts to find such content online?

Looking at the Transparency Report's Community Standards section, the results are looking really great!  

In the fourth quarter of 2018, only 78.6% of the offending drug content at Facebook was being removed by automation.  22% of it didn't get deleted until a user reported it, by clicking through the content reporting buttons.  By the 3rd Quarter of 2019, 97.6% of offending drug content was removed at Facebook by applying automation!

In Q4 2018, 122,493 pieces of drug content were "manually reported" while 449,906 pieces were "machine identified."

In Q3 2019, 105,600 pieces of drug content were "manually reported", but now about 4.3 million pieces were "machine identified."  

Terror Data

Twitter also produces a Transparency report and also shares information about content violations, but in most categories lags far behind Facebook on automation.  Twitter's latest transparency report says that "more than 50% of Tweets we take action on for abuse are now being surfaced using technology. This compares to just 20% a year ago."  The one category where they seem to be doing much better than that is terrorism.  Their last report covered the period January to June 2019.  Twitter does not share statistics about drug sales content, but does have Terrorism information.  During this period, 115,861 accounts were suspended for violations related to the promotion of terrorism.  87% of those accounts were identified through internal tools.  

Facebook doesn't share these numbers by unique accounts, but rather by the POSTS that have been actioned.  In the Q3 2019 data, Twitter actioned 5.2 million pieces of terror content.  98.5% of those posts were machine identified.

Tuesday, November 12, 2019

'Tis the Season for SCAMS!

A recent project that DarkTower worked on was related to fraudulent marketplaces offering too-good-to-be-true deals on electronics.  DarkTower's CEO Robin Pugh took those lessons and applied them to a recent online shopping experience ... I asked her to write it up for our blog:

As I was browsing some of my favorite Instagrammers this morning, one of them posted about a great coffee system that was on price rollback at for $99 – nearly half off the list price of $179.99.  As a coffee lover AND a bargain lover, I was immediately interested and began searching for more information.  Since I wasn’t familiar with how this particular coffee system worked, I typed the model name in my google search bar, intending to find some YouTube videos on how it worked, but since I left my search term fairly broad, some interesting sites popped up in my search results.
RED FLAG #1: Prices that are TOO good

WOW!  An even BIGGER BARGAIN… more than $10 less than the price?!  But on a site I’ve never heard of “Juli Shop,” so I began to take a closer look at the site, since we all know a) it’s hard to beat a Walmart price and b) if it’s too good to be true….  Well, you can finish that sentence.  (Other kitchen appliances on the site also had crazy discounts.  The "DeLonghi Dedica EC680 15 Bar Stainless Steel Slim Espresso" machine is only $160.99 at Juli Shop, but $299.99 at Bed Bath & Beyond and BestBuy, and $241 at

RED FLAG  #2:  Same Day delivery

Among the things I notice about Juli Shop, in the list of things they promote about their site is “Same Day Delivery.”  Really?  Same Day? So where are they located that they can promise same day delivery?
They purport to be in Citronelle, Alabama, with a local phone number; so I looked up the address on Google Maps and found that it’s a lovely 2 BR/2 BA brick ranch home that’s not currently for sale. The phone number – brace yourself – is disconnected. But they’ll definitely get me my Ninja Coffee Bar System today.

RED FLAG #3: Spelling Errors
I also notice in the menu bar that they want to tell me “Abouts Us”. Other sections of the menu are labeled "INFOMATION" and "CUSTORMER." Well, spelling errors are often a hallmark of scam sites and phishing emails, so I click to learn more “Abouts” them.
RED FLAG #4:  Information clearly copied from another site
Oddly, their About Us page has no mention of Juli Shop.  It is 100% about a fashion apparel company called Madison Island, and Juli Shop has no apparel merchandise at all.  Let’s check out Madison Island to see if it’s an affiliate, or maybe a parent company.

A quick search for Madison Island reveals that it is a fictitious demo store used to test Magento, a popular shopping cart processing plug-in, which Juli Shop uses to process its credit card transactions. By the way, Magento is targeted by one of the most prevalent malware families called Magecart.  Magecart is specifically to steal credit card credentials.  So let’s think of the possibilities here:  a scam site that takes your money and never delivers the promised item AND steals your credit card information at the same time.  That’s quite a criminal enterprise!

RED FLAG #5:  Sanity check
At this point, all signs point toward a scam site, and I’m pretty sure I’m going to be paying $10 more for my Ninja Coffee Bar; but before I move on, I check out
They give Juli Shop a 66% “TrustScore”, which puts it squarely in the “green” zone; but after reading the negative/positive comments, I’m not sure I agree.  First, the website was established 21 days ago.  The server is used by multiple websites, which isn’t uncommon for a small site, but they are offering items and services that are not typical of a small site.  Additionally, and quite concerning, the set up involves both the US and Vietnam.  A multi-country set-up is not common for a small site, and somehow Vietnam doesn’t jive with Citronelle, Alabama.

Further review of the data shows conflicting information around the site’s infrastructure, but also shows that there are no comments or reviews on typical review sites like Sitejabber and Trustpilot. The absence of this information is quite telling.

Scamadviser may give this site a 66% trust rating.  I’m giving it a 100% SCAM rating.

As the Christmas cyber shopping season is upon us, before you shop at a new online store, take the time to thoroughly review the site.  As demonstrated above, a few key checks and paying attention to red flags can quickly reveal whether you should be entering your credit card information there, and whether it may leave Santa with an empty sack on Christmas eve.

Saturday, November 09, 2019

Business Email Compromise (#BEC) Email Forwarding In Action

DarkTower President Robin Pugh was chatting with a friend who is the VP of Operations for her family business.  She mentioned as an aside that their email had been hacked, and of course, Robin’s cybercrime-fighter ears perked up.  The friend went on to explain that one of her clients, a global, Fortune 500 company, had called her to confirm email instructions from the company to start making payments into a different bank account.  But, of course, those were not legitimate instructions.

The screenshot below shows part of an email thread between her customer and the criminal using the compromised account.  What you cannot tell due to the redactions is that a cybercriminal had control of an account at the company; he messaged all customers to change the remittance instructions.  Even when the customer responded by email to confirm that these were legitimate instructions, the criminal assured the customer that the instructions were correct. 

However, the customer noticed some spelling and grammar discrepancies in the response and finally called the vendor to confirm.  Once alerted to the email compromise, the VP immediately changed the password to secure the email account.  This is certainly a "Best Practice" when responding to a phishing incident.  

But having spent time listening to Gary and Heather talk so much about Business Email Compromise, Robin knew to advise her friend to check one more thing…forwarding rules in the email client.  

After navigating in the email client to the Rules section, the VP found that a rule had been created to forward any messages mentioning the words “wire instructions,” “wire transfer,” “fund transfer,” “payment,” or “invoice” to the address blessingsalways823 at gmail dot com.

"If the message includes specific words in the subject or body 'wire instructions' or 'wire transfer' or 'funds transfer' or 'payment' or 'invoice'; forward the message to blessingalways823 at"

Even though Robin’s friend had already changed the email account password, the criminals were able to continue viewing and intercepting the email messages that were important to them.

The next steps were then to disable the rule, have I.T. check other users in the email domain for malicious forwarding rules, and then begin the process of notifying clients. 

A DarkTower investigation revealed that the Gmail account was used to register the domain name on 9/13/18, for which the registration details reveal the name and address Anthony L. Ania, 34501 Southside Park Dr, Solon, OH, 44139, phone 813-856-5005, and fax 650-253-0000.  The domain has never had a website and was probably used to impersonate an executive of Alpan Lighting Products, a company in California that uses the domain name  The address in Ohio may belong to a Cleveland attorney who has suffered identity theft, but there are at least three Nigerian profiles on Facebook using the same name, and the Google account password recovery process reveals that a phone number ending in 05 is tied to the Gmail account.

The criminal’s Gmail account was also seen on two boat sales websites, sailboatlistings dot com and powerboatlistings dot com, in lists of suspicious email addresses.

Lessons Learned:
1) Simply changing the password did not secure the account. 
2) Never confirm suspicious emails by replying to the suspicious email.
3) Regularly check rules in email accounts of your domain.

Tuesday, November 05, 2019

A Phish That Scans For Viruses

While I was on the train today I was checking email and found that I had received an interesting phish.  It was sent to an email i haven't used in years that apparently still fowards:

I certainly didn't want to miss my "incomming" fax, so I of course needed to click the link to "Preview Fax Message." 

The phish started off going to "outlake-q.hopto[.]com" and passing my email address as a parameter in the URL.  I changed that up a bit as you'll see below.  The HopTo address claims it is "Connecting to OneDrive" but it's really forwarding to the rest of the phish.

"Leak-weave[.]gq" says "Please wait ..." while it continues connecting to OneDrive I guess. . . ?
Once it connects to OneDrive (which apparently is now hosted at leak-weave) it asks me to "Please hold a while" as "OneDrive Security is scanning your file for virus!" 

Great news!  No Virus detected on file!

"Scan Complete!  Your file is secure and safe for download. Office365 OneDrive."  So I guess I can Download the file, right?

Not so fast!  First we have to confirm the password for "" 

It takes the time to actually connect to the mail server and concludes that I have entered an "Invalid password"

No file for you!

Now, if a visitor actually believed there was a file, they may have been tempted to provide their REAL password at this time.  I don't know if that would result in a Download or not, but I've decided not to find out!

Hope you enjoyed today's Adventure in Phishing!  Tune in next time to see .  .  . well, we don't know what yet.

Friday, November 01, 2019

A Targeted (?) Phish from a LinkedIn Connection

This morning while I was on the Exercise Bike at the UAB Rec Center I got a LinkedIn message from a colleague I haven't spoken to in a couple years.

That was actually the SECOND funny thing about my LinkedIn profile this morning.  The first one was that, since I'm a Premium Member, I get notified when people check out my Profile there.  I had one unusual visitor:

彭家’s Profile
Peng Jia has a TOTALLY BLANK LinkedIn profile.家-彭-334485167

I sent John a text message on his phone, but followed up, knowing I was likely talking to a scammer, with a LinkedIn Reply:

Well, since it was "really" from John, I finished my 10 miles on the bike, showered, ran back to my office and fired up a VM to visit his link:

Gee, what was I worried about?  It's totally from John!  It says right there!

Of course, some might find it odd that the "View Message Folder" link takes me to the URL 
" eone [.] ga /mm/business/proposal/afzz "

Now this is where "Targeting" comes in ... Take a look at this Phishing website and try to think what industry might be targeted by this LinkedIn-propagated phishing campaign?  

Hmmmm... AstraZeneca,  Proctor & Gamble, Boston Scientific (who makes Medical Devices)  and GE (who has a GE HealthCare line that makes many medical devices), Nationwide Insurance (who offers Health insurance plans)?  Looks like they are targeting the HealthCare sector.  But Pandora? (Update: I'm told that Pandora is the name of a system from that is used for doing data analytics to detect diverted pharmaceutical products.  I don't think this is their logo, but it is likely that the Pandora reference is to that.)

At first I was confused why a phish possibly targeting HealthCare would be coming after me, but then I realized ... I'M EMPLOYED BY UAB -- The University of Alabama at Birmingham -- one of the largest and best funded research hospitals in America!  Any chance that I'm getting LinkedIn spam because - to a casual observer - I'm a HOSPITAL employee?  And then having this tagged up with at least three health care logos?  Ok, so what happens next?

Well, then they steal your email and password ... 

Gmail was the only one that had a second page ... if you entered Gmail it then wanted your phone number too.

We grabbed the phishing kit, because that's what we do, and took a browse around.

All of the individual files have the same "Action" -- which is to call Finish . php

Finish is where the entered information gets mixed with environmental variables from your machine and all of the details get emailed to the criminal.

The last piece is that the email address is referred to by a variable name that isn't in this PHP file:

If you scroll back to the top of "Finish . PHP" you'll find what you need there:

The top line shows which additional files should be loaded by the phish.  "CONTROLS" is the one we want, which is where we find the criminal's email address: 

It would make a great example for my students if anyone in Law Enforcement cared about this ... but sadly, the only people who care are the LinkedIn Security Team, who had this account down so fast that by the time I responded with "So shall I call you Madiba?" my friend John's account had already been secured and gave me an error message.

A couple last funny notes ... the kit contains a file called "Netcraft_check.php" that checks to see if the user agent is "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727)") and refuses to load the page if it is.  Might want to update that user agent, Netcraft hunters.

There is also a file "visitor_log.php" that gives away the fact that all of the visitors to this phish have their IP address, timestamp, and browser agent shared.   because of this, we can tell that a bunch of people visited the phish from LinkedIn, becasue it adds "Mobile/15E148 [LinkedInApp]" to the end of the browser agent string.  As of this writing, only 127 unique visitors have been to the phish.  41 of them were browsing the phish from within the LinkedIn application on a Mobile Device.

Unfortunately for the Phisher, the poor fool put his phishing site behind CloudFlare, so the referring IP addresses are NOT the victim's IP address, they are all CloudFlare IPs.  Oh well.  Nice Try, Mister Phisher.  (we'll shoot this to CloudFlare to terminate your hosting as well.)