CALL US: 216-397-4080  | CLIENT HELP DESK: 216-539-3686

10 Billion Records and Deepfakes

10 Billion Records and Deepfakes

Now that 2018 is history, I decided to see what happened last year in the way of data breaches. A lot of press covered ransomware and cryptomining, while breaches faded into the background–like how a really loud noise that doesn’t stop eventually fades from perception.

Since it seems to be a challenge to find a definitive list of the worst data breaches, I decided to make my own. I have 50 breaches on my list, most from 2018 – with a few older ones listed just because of their size. Below are the top 20.

Entity                                                   # of Records                                       

  1. Yahoo (2016) 3,000,000,000
  2. Aadhar 1,100,000,000
  3. Onliner Spambot 711,477,622               
  4. Exploit.In logo 593,427,119               
  5. Marriott Starwood   500,000,000
  6. Anti Public List 457,962,538                                       
  7. River City Media 393,430,309
  8. MySpace 359,420,698                                       
  9. Exactis 340,000,000
  10. Twitter 330,000,000               
  11. NetEase logo 234,842,089               
  12. LinkedIn 164,611,595               
  13. Adobe 152,445,165
  14. UnderArmor 150,000,000
  15. MyFitnessPal 150,000,000
  16. Facebook 147,000,000
  17. Equifax (2017) 145,500,000               
  18. Exactis 131,577,763               
  19. Apollo 125,929,660
  20. Quora 100,000,000

The sources I used are listed online. For my top 50, the total number of records breached came to just under 10 billion. I don’t know if I am typical, but combined with my wife, we have logins for about 300 different accounts (banks, travel sites, hotels, airlines, etc.). If we take that number as an average, then over 3 billion individuals have been affected by data breaches. There are differing estimates about how many people have internet access, but it ranges from 34% to 43% of the world population so just about 3 billion people have access. Statistically that means everyone has been compromised. Of course, statistics can give a false picture. You may be one of the fortunate ones who has not been compromised–at least as far as you know.

Getting compromised is not a matter of IF, but WHEN, and maybe HOW OFTEN. To find out if you’ve been affected, go to and input your email address. You’ll get an answer to whether or not your email address has been exposed in what breach. You can then be sure to change the password affected in the specific breach to help secure your data. Unfortunately, it’s not just access to the accounts that have been compromised, but in many cases additional information about you has been stolen; mother’s maiden name, home address, home/mobile phone number, financial status, marital status, and a host of other information tidbits. Why is this important?

I haven’t seen this mentioned yet, but I am extrapolating a situation from the existence of “deepfakes”. If you missed this, it is the application of machine learning and AI (artificial intelligence) to do what is essentially photoshopping of video. Actors can be placed into movies they never made. Politicians can be made to say things they never said. There is even a program that will allow you to type what you want the person to say, and you can have a voice that exactly mimics someone saying things they never said. The situation obviously puts in jeopardy our ability to tell real from fake video. But what does this have to do with data breaches?

Deepfakes come about from giving an AI enough samples of real data about a person (voice samples, still pictures, video) to allow it to transpose one person onto another person and do it convincingly. What happens if you have thousands of pieces of data about a person? Can you start to predict what they will buy, where they will go? Can you get enough information to successfully access their accounts by being able to answer security questions because you know almost all of them? As more and more of us and our lives get stored online, more of this information is available when breached.

Determining whether or not a video is fake can be done (at least for now) if the original source is available. If you are testing something not original, the process is next to impossible. There is some discussion of using AIs to try and determine fakes by giving them samples of known good and known fakes. There is even some thought that important videos might be stored in blockchains to try and make sure a valid copy exists.

That doesn’t help your data. If someone has your drivers license number in combination with other data, along with some rudimentary document making skills, they can make fake identification documents and start doing business as you. Given the decrease in skill levels needed to do illegal things, this may increase to the point where it will really be hard to convince someone that the person that opened that line of credit was not you. (If you ever had to convince the IRS that the e-filing on your taxes was fraudulent, you know how hard this can be).

For now, be very active in monitoring your credit and passwords. If you hear about a breach, change your password. Freeze your credit reports. Be extremely careful about how much information you give anyone. Don’t answer security questions accurately, just record you answers and don’t reuse them. It’s a lot of work but the consequences are not pleasant.

Related Posts