Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Cross-posting this from another site I posted this on:

I am glad to see that someone has said what I had suspected. I made a quip [1] about the ethics of charging for information regarding whether or not you're affected by a breach--full disclosure, I run a free service that lets you search for data that may or may not have been leaked called Canary [2]. The only thing I care to charge for is if you're a business and want to link this into your SIEM or whatever, I want you to pay up--otherwise feel free to search manually.

1.2 billion compromised credentials is a lot. In fact, it's so much that I found it hard to believe. Adobe's breach was no more than 150 million user accounts, and if you take a look at various sources [3], you'll be able to infer that the number of affected users overall is anywhere between 250 and 400 million depending on where or who you look at (Wikipedia was quick to cite in this case). Needless to say, I call bullshit on this count unless Hold Security is somehow getting its numbers wrong due to an extremely large amount of duplicates.

To put it all into context, Canary has about 476,000 e-mail addresses stored within its databases with about 350,000 potential passwords stored in a hashed format. This has been collected since mid-last year from over 1.1 million unique samples. It's completely automated so I am not sitting on random TOR websites collecting it myself, it's done without me having to take a look.

If you're interested in helping contribute to the project, I'd love to hear from you. I'd like to see us avoid having to rely on products like Hold Security's because leaked data is data that everyone should be able to know without having to pay a broker an enormous sum.

[1] http://canarypw.wordpress.com/2014/08/06/canary-will-not-cha...

[2] https://canary.pw/

[3] http://en.wikipedia.org/wiki/Data_breach



That's assuming that the leak is from a single source (single business or website).

Is it conceivable that if you aggregate all leaks it would equal the 1.X billion number referenced?

The location of hundreds of thousands of WordPress and similar CMS sites are well known and constantly hit by botnets trying to bruteforce passwords and waiting for site admins to leave a site un-updated, I'm sure they get compromised all the time. I operate several honeypots just for fun and have a list of thousands of sites with malware on them and undoubtedly have had their databases stolen. Then everyone who runs a site (or has sufficient permissions) who is in that hacked database can have their site hacked, and the chain continues. If you delete the malware but fail to fix the security hole, the malware will be back on that site the next day.

Why does your Canary site not include larger leaks like the Adobe leak?

I think sites like yours are very important for protecting people's online security. Data mining is fun and being able to offer a service to help people stay safe online is a win-win.


Canary does not include data from the Adobe leak because I just haven't bothered. I have considered making it available but because there are many other solutions that are better for such a purpose [1], I'd rather just leave it to them and for the future link to them via Canary's site.

In the future I may consider a solution for this but for now it's just not on the priority list. It's a feature I'd like to have but I just don't have a practical way to do it as of yet.

Regarding the count, if you go through all the leaks, it barely gets above 250,000,000. This is based on many statistics from Verizon and other initiatives. My estimate is that it is probably less than double that but it cannot be much higher. It would take a few Twitter, Adobe, or Facebook-sized leaks to get to 1.2 billion because even if you had 50,000 Wordpress sites breached and they on average only have 5 accounts per site, it's barely going to make a dent in that 250,000,000 I floated.

If 1.2 billion accounts are floating about, someone hasn't spoken up and likely we'd have gotten wind of this by now.

Data mining is awesome. :)

[1] https://lastpass.com/adobe/


Adobe + Target + PlayStation leaks alone total over 300 million credentials.

AOL 2004 = 92 million

AOL 2006 = 20 million

Apple 2012 = 12 million

Blizzard 2012 = 14 million

BNY Mellon 2008 = 12.5 million

Cardsystems Solutions 2005 = 40 million

Evernote 2013 = 50 million

GS Caltex 2008 = 11 million

Heartland 2009 = 130 million

Living Social 2013 = 50 million

RockYou 2009 = 32 million

TMobile 2006 = 17 million

TJ Maxx 2011 = 94 million

US DoD 2009 = 76 million

US Dept of Veteran Affairs 2006 = 26.5 million

Yahoo 2013 = 22 million

Those are only the ones over 10 million before 2014, leaving out some foreign hacks like Auction.co.kr (18 million). (Also UK NHS 8.3 million, LinkedIn 8 million, Ebay, Target, LexisNexis)

I promise you that just over 1 billion is about the number of major company leaks that are publicly available. Yes, lots of duplicates between those leaks, but most contain additional valuable data besides just the credentials.

ATT, BCBS, Citigroup, Facebook, Gap, Gawker, Chase, Medicaid, Monster.com, Network Solutions, Nintendo, Sega, Starbucks, Twitter, Ubisoft, Washington Post, and numerous government and academic organizations all have hundreds of thousands to millions of credentials publicly available.


I might be willing to concede on this but you also have to take age into account.

Let's look at this useful spreadsheet:

https://docs.google.com/spreadsheet/ccc?key=0AmenB57kGPGKdHh...

And here are the numbers:

Hacked: 880,575,016

Inside job: 137,714,840

Accidental: 63,322,485

Lost/stolen: 206,237,702

Misc: 1,825,350

Things like government-issued identification numbers and whatnot are the most severe, so how many people have been affected by that since 2004? If we break it apart by category we start to see how these breaches have become:

E-mail addresses: 530,991,405

SSN/PII: 327,471,624

Credit card: 335,772,083

Authentication: 430,756,146

Bank records: 4,270,000

For the first line, it's just a list of e-mail addresses. The latter four are the most severe. Out of that list, what is the most useful? I'd wager the SSN/PII, authentication, and bank records; credit cards are only useful for so long really.

This means we're at over 750,000,000 records that may be usable. However, with the authentication portion, we're looking at that being even more useless as time goes on. Accounts from 2004 may not be usable in 2014 either.

So yes. We have had over 1.2 billion records leaked, but really how much of that is at all useful? None of these take duplicates into account however.


Useful to me? None.

Useful to spammers? Half a billion emails is very useful.

Useful to hackers? Hundreds of millions of records.

Useful for fraud? Hundreds of millions of records.

Useful for data-mining and intelligence? All of them.

Many banks don't issue new debit/credit card numbers, they just change the expiration (the 3 digit code is rarely used from my personal experiences). It's easy to brute force the expiration.

SSN numbers and security questions can allow access to many accounts.

Figuring out password hashes (lots of methods) is sometimes easy, sometimes hard.

Bank account numbers rarely change, damage can be done with these.

How many people use their real information online? Most.

How many use secure passwords and change their passwords ever/enough? Few.

The LexisNexis breach alone is a disaster, they're basically data-miners with exclusive access to personal details.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: