Monday, November 18, 2019

Facebook's Transparency Report: (Expert) Supervised Machine Learning Works!

Last summer the BBC technology program "Click" came to visit the lab for a special called "Can Technology Solve the Opioid Crisis?"  One of the points we stressed with @NickKwek was that when we report opiods and fentanyl-related posts to Facebook the objective is not to take down THAT POST, but rather to help Facebook's automated tools update their models of what offensive drug sales content looks like.

Last week we had an opportunity to see what that looks like in action as Facebook released their transparency report for Q3 2019.  Facebook's Transparency report is divided into two major sections which each have two subsections. "Enforcement of our Standards" covers "Community Standards Enforcement" and "Intellectual Property Infringement."  The other major section, "Legal Requests" is divided into "Government Requests for User Data" and "Content Restrictions Based on Local Law."

The November 2019 transparency report for Community Standards looks at ten categories of content on Facebook and four categories of content on Instagram.

In this post, we'll look primarily at the statistics for "Regulated Goods: Drugs and Firearms" but the other categories on Facebook are:

  • Adult Nudity and Sexual Activity
  • Bullying and Harassment
  • Child Nudity and Sexual Exploitation of Children
  • Fake Accounts
  • Hate Speech
  • Spam
  • Terrorist Propaganda
  • Violent and Graphic Content
  • Suicide and Self-injury
On Instagram, the other categories are:
  • Child Nudity and Sexual Exploitation of Children
  • Suicide and Self-injury
  • Terrorist Propaganda
Facebook has shared previously about our work to reduce terrorist content on their platform.  See their "Hard Questions" blog post -- "Are We Winning the War on Terrorism Online."  In this most recent report, they share that "Our proactive rate for detecting content related to al-Qaeda, ISIS and their affiliates remained above 99% in Q2 and Q3 2019, while our proactive rate for all terrorist organizations in Q2 and Q3 2019 is above 98%."

What does that mean?  It means that through the power of machine learning, when someone posts content trying to "express support or praise for groups, leaders, or individuals involved in terrorist activities" the content is removed automagically without the need for anyone to report it 98-99% of the time!

They've also previously discussed our relationship regarding the Opioid Crisis.  See their post "Supporting Our Community in the Face of the Opioid Epidemic." 

As Facebook has focused on identifying drug-related content, the number of detections has risen.  That's likely from two reasons -- one, they are now discovering content that previously would have remained unreported in the past; but also two, frustrated users are attempting to post their drug sales information in more ways trying to get past the blocks -- and largely failing to do so.

Drug related posts actioned:
  • 572,400 posts in Q4 2018
  • 841,200 posts in Q1 2019 
  • 2,600,000 posts in Q2 2019 
  • 4,400,000 posts in Q3 2019
When I attended Facebook's Faculty Summit all the way back in 2016  they had me hooked from the very beginning of the day when Facebook's Engineering Director Joaquin Quinonero Candela gave his opening keynote.  All of this amazing machine learning technology that people like Dr. Candela had created to help improve online ad delivery were ALSO being used to make the platform as safe as possible against a wide variety of threats. I was especially excited to learn about the work of Wendy Mu. At the time Wendy's bio said "Wendy is an engineering manager on Care Machine Learning, which leverages machine learning to remove abusive content from the site.  Over the last three years at Facebook, she has also worked on Site Integrity, Product Infrastructure, and Privacy."  Wendy and her team are inventing and patenting new ways of applying machine learning to this problem space.  Nektarios Leontiadis "a research scientist on the Threats Infrastructure Team" with a PhD in online crime modeling and prevention from Carnegie Mellon and Jen Weedon, previously at FireEye, were some of the other folks I met there that made such a profound impression on me!  Since then, the UAB Computer Forensics Research Lab has partnered with Facebook on many projects, but quite a few have taken the form of "what would a human expert label as offending content in this threat space?"

This is where "supervised machine learning" comes into play.  

The simplest version of Supervised Machine Learning is the "I am not a Robot" testing that Google uses to label the world.  You may be old enough to remember when Google perfected their Google Books project by asking us to human label all of the unreadable words that their scanner lifted from old books, but which were not properly recognized by their OCR algorithm.  Then we were asked to label the address numbers found on buildings and mailboxes and then later to choose cars, bicycles, traffic lights, and more recently cross walks as it seems we are not teaching future self-driving cars how to not drive over pedestrians.

This works well for "general knowledge" types of supervised learning.  Anyone over the age of three can fairly reliably tell the difference between a Cat and a Dog.  When people talk about supervised machine learning, that is the most common example, which comes from the concept of "Convolutional Neural Networks".  Do a search on "machine learning cat dog" and you'll find ten thousand example articles, such as this image from Booz Allen Hamilton.

Booz Allen Hamilton infographic 


We're working on something slightly different, in that the labeling requires more specialized knowledge than "Cat vs. not Cat".   Is this chemical formula a Fentanyl variant?  Is the person in this picture the leader of a terrorist organization?  What hashtags are opioid sellers using to communicate with one another once their 100 favorite search terms are being blocked by Facebook and Instagram?

Facebook Research has a nice set of videos that explain some of the basics of Machine Learning that are shared as part of the "Machine Learning Academy" series:

from: https://research.fb.com/videos/field-guide-to-machine-learning-lesson-1-problem-definition/
In this chart, the data provided by UAB is primarily part of that "Data Gathering" section ... by bringing forensic drug chemists into the lab, we're able to provide a more sophisticated set of "labelers" than the general public.  Part of our "Accuracy testing" then comes in on the other end.  After the model built from our data (and the data from other reporters) is put into play, does it become more difficult for our experts to find such content online?

Looking at the Transparency Report's Community Standards section, the results are looking really great!  


In the fourth quarter of 2018, only 78.6% of the offending drug content at Facebook was being removed by automation.  22% of it didn't get deleted until a user reported it, by clicking through the content reporting buttons.  By the 3rd Quarter of 2019, 97.6% of offending drug content was removed at Facebook by applying automation!

In Q4 2018, 122,493 pieces of drug content were "manually reported" while 449,906 pieces were "machine identified."

In Q3 2019, 105,600 pieces of drug content were "manually reported", but now about 4.3 million pieces were "machine identified."  

Terror Data

Twitter also produces a Transparency report and also shares information about content violations, but in most categories lags far behind Facebook on automation.  Twitter's latest transparency report says that "more than 50% of Tweets we take action on for abuse are now being surfaced using technology. This compares to just 20% a year ago."  The one category where they seem to be doing much better than that is terrorism.  Their last report covered the period January to June 2019.  Twitter does not share statistics about drug sales content, but does have Terrorism information.  During this period, 115,861 accounts were suspended for violations related to the promotion of terrorism.  87% of those accounts were identified through internal tools.  

Facebook doesn't share these numbers by unique accounts, but rather by the POSTS that have been actioned.  In the Q3 2019 data, Twitter actioned 5.2 million pieces of terror content.  98.5% of those posts were machine identified.





Tuesday, November 12, 2019

'Tis the Season for SCAMS!

A recent project that DarkTower worked on was related to fraudulent marketplaces offering too-good-to-be-true deals on electronics.  DarkTower's CEO Robin Pugh took those lessons and applied them to a recent online shopping experience ... I asked her to write it up for our blog:

As I was browsing some of my favorite Instagrammers this morning, one of them posted about a great coffee system that was on price rollback at Walmart.com for $99 – nearly half off the list price of $179.99.  As a coffee lover AND a bargain lover, I was immediately interested and began searching for more information.  Since I wasn’t familiar with how this particular coffee system worked, I typed the model name in my google search bar, intending to find some YouTube videos on how it worked, but since I left my search term fairly broad, some interesting sites popped up in my search results. 

https://julishopgame.com/index.php/ninja-coffee-bar-system-cf097.html
RED FLAG #1: Prices that are TOO good

WOW!  An even BIGGER BARGAIN… more than $10 less than the Walmart.com price?!  But on a site I’ve never heard of “Juli Shop,” so I began to take a closer look at the site, since we all know a) it’s hard to beat a Walmart price and b) if it’s too good to be true….  Well, you can finish that sentence.  (Other kitchen appliances on the site also had crazy discounts.  The "DeLonghi Dedica EC680 15 Bar Stainless Steel Slim Espresso" machine is only $160.99 at Juli Shop, but $299.99 at Bed Bath & Beyond and BestBuy, and $241 at WalMart.com.)


RED FLAG  #2:  Same Day delivery

Among the things I notice about Juli Shop, in the list of things they promote about their site is “Same Day Delivery.”  Really?  Same Day? So where are they located that they can promise same day delivery?

https://julishopgame.com/index.php/contacts/
They purport to be in Citronelle, Alabama, with a local phone number; so I looked up the address on Google Maps and found that it’s a lovely 2 BR/2 BA brick ranch home that’s not currently for sale. The phone number – brace yourself – is disconnected. But they’ll definitely get me my Ninja Coffee Bar System today.

RED FLAG #3: Spelling Errors
I also notice in the menu bar that they want to tell me “Abouts Us”. Other sections of the menu are labeled "INFOMATION" and "CUSTORMER." Well, spelling errors are often a hallmark of scam sites and phishing emails, so I click to learn more “Abouts” them.

https://julishopgame.com/index.php/about-us/
RED FLAG #4:  Information clearly copied from another site
Oddly, their About Us page has no mention of Juli Shop.  It is 100% about a fashion apparel company called Madison Island, and Juli Shop has no apparel merchandise at all.  Let’s check out Madison Island to see if it’s an affiliate, or maybe a parent company.

A quick search for Madison Island reveals that it is a fictitious demo store used to test Magento, a popular shopping cart processing plug-in, which Juli Shop uses to process its credit card transactions. By the way, Magento is targeted by one of the most prevalent malware families called Magecart.  Magecart is specifically to steal credit card credentials.  So let’s think of the possibilities here:  a scam site that takes your money and never delivers the promised item AND steals your credit card information at the same time.  That’s quite a criminal enterprise!

RED FLAG #5:  Sanity check
At this point, all signs point toward a scam site, and I’m pretty sure I’m going to be paying $10 more for my Ninja Coffee Bar; but before I move on, I check out scamadviser.com.
https://www.scamadviser.com/check-website/julishopgame.com/index.php/about-us
They give Juli Shop a 66% “TrustScore”, which puts it squarely in the “green” zone; but after reading the negative/positive comments, I’m not sure I agree.  First, the website was established 21 days ago.  The server is used by multiple websites, which isn’t uncommon for a small site, but they are offering items and services that are not typical of a small site.  Additionally, and quite concerning, the set up involves both the US and Vietnam.  A multi-country set-up is not common for a small site, and somehow Vietnam doesn’t jive with Citronelle, Alabama.

Further review of the scamadviser.com data shows conflicting information around the site’s infrastructure, but also shows that there are no comments or reviews on typical review sites like Sitejabber and Trustpilot. The absence of this information is quite telling.

Scamadviser may give this site a 66% trust rating.  I’m giving it a 100% SCAM rating.

As the Christmas cyber shopping season is upon us, before you shop at a new online store, take the time to thoroughly review the site.  As demonstrated above, a few key checks and paying attention to red flags can quickly reveal whether you should be entering your credit card information there, and whether it may leave Santa with an empty sack on Christmas eve.

Saturday, November 09, 2019

Business Email Compromise (#BEC) Email Forwarding In Action


DarkTower President Robin Pugh was chatting with a friend who is the VP of Operations for her family business.  She mentioned as an aside that their email had been hacked, and of course, Robin’s cybercrime-fighter ears perked up.  The friend went on to explain that one of her clients, a global, Fortune 500 company, had called her to confirm email instructions from the company to start making payments into a different bank account.  But, of course, those were not legitimate instructions.

The screenshot below shows part of an email thread between her customer and the criminal using the compromised account.  What you cannot tell due to the redactions is that a cybercriminal had control of an account at the company; he messaged all customers to change the remittance instructions.  Even when the customer responded by email to confirm that these were legitimate instructions, the criminal assured the customer that the instructions were correct. 




However, the customer noticed some spelling and grammar discrepancies in the response and finally called the vendor to confirm.  Once alerted to the email compromise, the VP immediately changed the password to secure the email account.  This is certainly a "Best Practice" when responding to a phishing incident.  

But having spent time listening to Gary and Heather talk so much about Business Email Compromise, Robin knew to advise her friend to check one more thing…forwarding rules in the email client.  

After navigating in the email client to the Rules section, the VP found that a rule had been created to forward any messages mentioning the words “wire instructions,” “wire transfer,” “fund transfer,” “payment,” or “invoice” to the address blessingsalways823 at gmail dot com.



"If the message includes specific words in the subject or body 'wire instructions' or 'wire transfer' or 'funds transfer' or 'payment' or 'invoice'; forward the message to blessingalways823 at gmail.com."


Even though Robin’s friend had already changed the email account password, the criminals were able to continue viewing and intercepting the email messages that were important to them.

The next steps were then to disable the rule, have I.T. check other users in the email domain for malicious forwarding rules, and then begin the process of notifying clients. 

A DarkTower investigation revealed that the Gmail account was used to register the domain name alpan.us on 9/13/18, for which the registration details reveal the name and address Anthony L. Ania, 34501 Southside Park Dr, Solon, OH, 44139, phone 813-856-5005, and fax 650-253-0000.  The domain has never had a website and was probably used to impersonate an executive of Alpan Lighting Products, a company in California that uses the domain name alpan.com.  The address in Ohio may belong to a Cleveland attorney who has suffered identity theft, but there are at least three Nigerian profiles on Facebook using the same name, and the Google account password recovery process reveals that a phone number ending in 05 is tied to the Gmail account.



The criminal’s Gmail account was also seen on two boat sales websites, sailboatlistings dot com and powerboatlistings dot com, in lists of suspicious email addresses.

Lessons Learned:
1) Simply changing the password did not secure the account. 
2) Never confirm suspicious emails by replying to the suspicious email.
3) Regularly check rules in email accounts of your domain.