Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Backblaze Drive Stats for Q1 2024 (backblaze.com)
246 points by TangerineDream on May 2, 2024 | hide | past | favorite | 94 comments


I, too, love Backblaze's reports. But they provide no information regarding drive endurance. While I became aware of this with SSDs, HDD manufacturers are reporting this too, usually as a warranty item, and with surprisingly lower numbers than I would have expected.

For example, in the Pro-sumer space, both WD's Red Pro and Gold HDDs report[1] their endurance limit as 550TB/year total bytes "transferred* to or from the drive hard drive", regardless of drive size.

[1] See Specifications, and especially their footnote 1 at the bottom of the page: https://www.westerndigital.com/products/internal-drives/wd-r...


The endurance figures for hard drives are probably derived from the rated number of seek operations for the heads, which is why it doesn't matter whether the operations are for reading or writing data. But that bakes in some assumptions about the mix of random vs sequential IO. And of course the figures are subject to de-rating when the company doesn't want the warranty to cover anything close to the real expected lifespan, especially for products further down the lineup.


I buy hard drives based on these reports. Thank you Backblaze.


Where do you buy your drives? Last time I was in the market, I couldn't find a reputable seller selling the exact models in the report. I'm afraid that the less reputable sellers (random 3rd party sellers on Amazon) are selling refurbished drives.

I ended up buying a similar sounding but not same model from CDW.


These are useful data points, but I've found that at my risk tolerance level, I get a lot more TB/$ buying refurbished drives. Amazon has a couple of sellers that specialize in server pulls from datacenters, even after 3 years of minimal use, the vendors provide 5 years of additional warranty to you.


> even after 3 years of minimal use, the vendors provide 5 years of additional warranty to you.

The Amazon refurb drives (in this class) typically come with 40k-43k hours of data center use. Generally they're well used for 4½-5yrs. Price is ~30% of new.

I think refurb DC drives have their place (replaceable data). I've bought them - but I followed other buyers' steps to maximize my odds.

I chose my model (of HGST) carefully, put it thru an intensive 24h test and check smart stats afterward.

As far as the 5yr warranty goes, it's from the seller and they don't all stick around for 5 years. But they are around for a while -> heavy test that drive after purchase.


Buying refurbished also makes it much easier to avoid having the same brand/model/batch/uptime, for firmware and hardware issues. I do carefully test for bad sectors and verify capacity, just in case.


I think you're better off buying used and using the savings for either mirroring or off-site backup. I'd take two mirrored used drives from different vendors over one new drive any day.


There was a Backblaze report a while ago that said, essentially, that most individual drives are either immediate lemons or run to warranty.

If you buy used, you're avoiding the first form of failure.


Indeed- RAID used to stand for Redundant Array of Inexpensive Disks. The point was to throw a bunch of disks together and with redundancy it didn't matter how unreliable they were. Using blingy drives w/ RAID feels counter-intuitive- at least as a hobbyist.


A lot of those resellers do not disclose that the drive isn't new, even labeling the item as new.

GoHardDrive is notorious for selling "new" drives with years of power on time. Neither Newegg nor Amazon seem to do anything about those sellers


Any specific sellers you'd recommend?


Refurbed drives have a MUCH HIGHER failure rate. I used to send back lots of drives to Seagate, they come back with the service sticker and that means trouble. YMMV


These generally aren't refurbed drives, they are used drives that sat in a datacenter for 3-5 years.


In europe lambda tek is my goto for enterprise hardware as a retail customer.


Lots of good options here: https://diskprices.com/



Note that they list at least one vendor as selling "New" drives when they are not even close to being new.


It's definitely scraped with a few simple queries and not moderated by a human, you have to manually check before buying of course. It just saves a few minutes of time automating the initial search.


I think there will eventually be a false advertising lawsuit or some regulatory action against Amazon about this. Until that happens, it’s hard to say for certain which items are used.


And for stuff like this, many companies will have an approved vendor, and you have to buy what they offer or go through a justification for an exception.


B&H has quite a few


I guess it isn’t that surprising given the path the development took, but it is always funny to me that one of the most reputable consumer tech companies is a photography place.


Similar to how the most popular online retailer is a bookstore. Successful businesses are able to expand and I wish B&H the best of luck on that path, we need more companies like them.


I'd rather companies stick to one thing and do it well, rather than expand into every industry out there and slowly creep into every facet of society.

Like that bookstore that just happens to retail some stuff too.


B&H seems to be pretty focused on techy things (and cameras of all sorts have always been techy things, though that corner of the tech market that has been declining for a long time now).

When they branch out to selling everything including fresh vegetables, motor oil, and computing services, then maybe they might be more comparable to the overgrown bookstore.


I definitely learn towards B&H for electronic things. It’s quite a bit less “internet flea market” that Amazon often is.


There used to be a much more distinct camera—and all rhe ancillary gear and consumables than there used to be. Though B&H still sells a ton of lighting and audio gear as well as printers and consumables for same.

They sell other stuff too but they’re still pretty photo and video-centric, laptops notwithstanding.


AWB&H alone is a Fortune 500 company


I buy most, but not all, of my tech at B&H and have now for more than a decade. Especially peripherals.


What's the risk of buying Amazon & running a SMART/crystaldisk test?


I don’t buy hard drives based on these reports. I buy SSDs and let my cloud providers deal with hard drives.


> The 4TB Toshiba (model: MD04ABA400V) are not in the Q1 2024 Drive Stats tables. This was not an oversight. The last of these drives became a migration target early in Q1 and their data was securely transferred to pristine 16TB Toshiba drives.

That's a milestone. Imagine the racks that were eliminated


> That's a milestone. Imagine the racks that were eliminated

I'm imagining about 3/4ths ;)


I'm imagining 4x capacity


3/4ths of the racks that had 4TB drives, assuming they didn't also expand capacity as part of this.

But they run many drive types.


Perhaps not eliminated, but repurposed with fresh 16TB drives. And the power savings per byte stored!


Yeah, but just thinking about it reminds me how annoyed I am that they increased the B2 pricing by 20% last year.

Right after launching B2, in late 2015, they made their post about storage pod 5.0, saying it "enabled" B2 at the $5/TB price, at 44 cents per gigabyte and a raw 45TB per rack unit.

In late 2022 they posted about supermicro servers costing 20 cents per gigabyte and fitting a raw 240TB per rack unit.

So as they migrate or get new data, that's 1/5 as many servers to manage, costing about half as much per TB.

It's hard to figure out how the profit margin wasn't much better, despite the various prices increases they surely had to deal with.

The free egress based on data stored was nice, but the change still stings.

Maybe I'm overlooking something but I'm not sure what it would be.

In contrast the price increases they've had for their unlimited backup product have always felt fine to me. Personal data keeps growing, and hard drive prices haven't been dropping fast. Easy enough. But B2 has always been per byte.

And don't think I'm being unfair and only blaming them because they release a lot of information. I saw hard drives go from 4TB to 16TB myself, and I would have done a similar analysis even if they were secretive.


Inflation. At the rate it went up the last couple of years, a 20% price increase to put them back on the right side of profits is more than probable.


Maybe I wasn't clear, but the hardware costs and the operation costs should all have dropped between 2x and 5x as a baseline before price increases.

Inflation is not even close to that level.

And those hardware costs already take into account inflation up through the end of 2022.


> e hardware costs and the operation costs should all have dropped between 2x and 5x

That would work if they fully recouped the costs of obtaining and running the drives, including racks, PSUs, cases, drive and PSU replacements, control boards, datacenter/whatever costs, electricity, HVAC etc. and generated a solid profit not only to buy all the new hardware but a new yacht for the owners too.

But usually that is not how it works, because the nobody sane buys the hardware with the cash. And even if they have a new fancy 240TB/rack units, that doesn't mean they just migrated outright and threw the old ones ASAP.

So while there is a 5x lower costs per U for the new rack unit, it doesn't translate to 5x lower cost of storage for the sell.


I would sure hope the original units were recouped after 8 years.

You can look at their stats and see that the very vast majority of their data is on 12-16TB drives, and most of the rest is on 8TB drives. Even with those not being the very newest and cheapest models, their average server today is a lot denser and cheaper than their brand new servers 8 years ago.


TL;DR Business is hard.


Also a storage inflation on the users side. People have more data on bigger drives that wants a backup.


This is B2, the service that charges per byte. More data makes it easier for them to profit.


I wonder how the pricing works out. I look at the failure rates and my general take away is "buy Western Digital" for my qty 1 purchases. But if you look within a category, say 14TB drives, they've purchased 4 times as many Toshiba drives as WD. Are the vendors pricing these such that it's worth a slightly higher failure rate to get the $/TB down?


If you are a large company owning hundreds of thousands of them and knowing you will have disk failures regardless, maybe. If you own just a few hundreds and a failure costs you money the logic may be completely different.


I'd assume so. Also consider that if a drive fails under warranty, and you're already dealing with a bunch of failing drives on a regular basis, the marginal cost to get a warranty replacement is close to zero.


Amazing these have continued. I base my NAS purchase decisions on these and so far haven't led me astray.


How would they lead you astray? I wouldn't consider a drive failure in a home NAS to indicate that - even their most statistically reliable drives still require redundancy/backup - if you haven't experienced a drive failure yet, that's just chance.


Well.. that might be true for a lot of normal NAS with 8 drives or less.

I on the other hand have a 4U 48 bay Chenbro so drive failures are somewhat significant for me lol.

Redundancy wise it's 4 raidz2 vdevs with 12 drives each and backed up to rsync.net I have had 2 drive failures, one was shortly after commissioning and the other happened a few months ago which was pretty random.

I'm using HGST drives, specifically 8TB He8 and they have been really solid in operation since 2016. I don't have any spares left now though so when I get back to where the chassis is hosted I will be doing a rebuild onto 16TB drives.

On the other hand in my professional life I experienced arrays that had multiple drives fail in quick succession (especially around 2010-2012 era) from less ... reliable brands cough Seagate cough.

So I would consider 2 failures from ~1.5M drive hours to be very good and thank Backblaze for convincing me to shell out on these rather more expensive drives.


I'd note a couple things:

1. 2 failures across 1.5M hours is something around a 1.2% AFR, which is good, but not significantly below Backblaze's average. Definitely better than the stats on their worst drives.

2. Assuming some premium for higher-reliability drives, and some required storage growth over time, the most efficient drives to buy are those that fail at exactly the rate that lets you replace them at with higher-capacity drives as needed for storage growth. I'm personally at the point where I'm decommissioning 2TB drives with 100k hours; I'd be better off having saved some money and having the drives fail now.


ever since backblaze started doing these there has been a dedicated set of seagate fanboys (I know, it’s the oddest thing to fanboy over) who come up with literally any excuse to avoid acknowledging that seagate might have higher than normal failure rates, and that has included throwing shade at “well, you don’t know the failure rate of those Toshibas and WDs in home usage!!!”.


Which specific ones do you like so far?


I have had 48 HGST He8 8TB drives online since 2016. 2 failures in that time, one was warranted the other happened recently.


They are kingkong. After they started publishing these Seagate seemingly stopped selling trash less and less. Had so many Seagate drives going south. Bleh. Would be nice to see SSD drive stats too. There are so many terrible SSDs out there, like SP, which has utter trash controllers. One day your drive gets locked up without any forewarning, and your data just disappears.


Backblaze has been slowly releasing SSD stats, but their usage of HDDs dwarfs SSD usage, so it won’t be as useful.

https://www.backblaze.com/blog/ssd-edition-2023-mid-year-dri...


As with every time these come out, Remember that Backblaze's usage pattern is different from yours!

Well, unless you're putting large numbers of consumer SATA drives into massive storage arrays with proper power and cooling in a data center.


I find the stats interesting, but it's hard to actually inform any decisions because by the time the stats come out, who knows what's actually shipping.


Does Backblaze ever buy refurbs? I'm guessing not, but I'd be curious to see any data on how failure rates compare after manufacturers recertify.


I can't think of any reason why the lifetime would be any different for a refurb. of course, you need to start from when the drive was originally used. of course, there is probably also some additional wear and tear just due to the removal, handling, and additional shipping of the drives.


In some ways that would be incredibly noisy to test. However it could be a good way to measure the practicality of S.M.A.R.T metrics. Finding out how accurate they are at predicting hdd lifespan would be a great finding.


Does anyone find value in SMART metrics?

In my experience, the drives report "healthy" until they fail, then they report "failed"

I've personally never tracked the detailed metrics to see if anything is predictive of impending failure, but I've never seen the overall status be anything but "healthy" unless the drive had already failed.


The SMART metrics aren't binary, and any application that is presenting them as binary (Either HEALTHY or FAILED) is doing you a disservice.

> I've personally never tracked the detailed metrics to see if anything is predictive of impending failure

Backblaze has!

https://www.backblaze.com/blog/hard-drive-smart-stats/


From that link:

From experience, we have found the following five SMART metrics indicate impending disk drive failure:

    SMART 5: Reallocated_Sector_Count.
    SMART 187: Reported_Uncorrectable_Errors.
    SMART 188: Command_Timeout.
    SMART 197: Current_Pending_Sector_Count.
    SMART 198: Offline_Uncorrectable.
That's good to know, I might start tracking that. I manage several clusters of servers and hard drive failures just seem pretty random.


I've had several hard drives that started gradually increasing a reallocated sector count, then start getting reported uncorrectable errors, then eventually just give up the ghost. Usually whenever reallocated sectors starts climbing a drive is nearing death and should be replaced as soon as possible. You might not have had corruption yet, but its coming. Once you get URE's you've lost some data.

However, one time a drive got a burst of reallocated sectors, it stabilized, then didn't have any problems for a long time. Eventually it wouldn't power on years later.


Absolutely. I've looked at the SMART data of easily over 1000 drives. Many of them ok, many of them with questionable health, many failing and many failed. The SMART data has always been a valuable indicator as to what's going on. You need to look at the actual values given by tools like smartctl or CrystalDiskInfo. Everything you need to evaluate the state of your drives is there.

I've never seen an HDD fail overnight without any indication at all.


I've had an M.2 NVMe drive start reporting bad blocks via SMART. I kept using it for non-critical storage, but replaced it as my boot drive. Obviously not the same failure pattern as spinning rust, but I was glad for the early warning anyway.


Refurbed drives, at least for Seagate, are terrible.


Says the annual failure rate is 1.5%, but average time to failure is 2.5 years? Those numbers don't line up.

Are most drives retired without failing?


> Are most drives retired without failing?

I'd expect so given that HDDs are still having significant density advancements. After a while old drives aren't worth the power and sled/rack space that could be used for a higher capacity drive. And, yeah, it makes these statistics make more sense together.

Edit: plus they are just increasing drive count so most drives haven't hit the time when they would fail or be retired...


"Are most drives retired without failing?"

Yes, certainly.

One can watch both SMART indicators as well as certain ZFS stats and catch a problem drive before it actually fails.

I like to remove drives from zpools early because there is a common intermediate state they can fall into where they have not failed out but dramatically impact ZFS performance as they timeout/retry certain operations thousands and thousands of times.


What's the best way to monitor those ZFS stats? I just rely on scheduled ZFS scrubs, and the occasional `zpool status -v`...


Yes, and because of that the numbers on the average time to failure are completely meaningless. The drives the don't ever fail skew the numbers completely. If a fantastically reliable drive were to have 5/5000 drives fail, but they all failed in the first month and then the rest carried on forever, then that would show here as having a lower "reliability" than a dire drive where 4000/5000 drives fail after a year.

I'd like to see instead something like mean time until 2% of the drives fail. That'd actually be comparable between drives. And yes, it would also mean that some drive types haven't reached 2% failure yet, so they'd be shown as ">X months".

This is what a Kaplan-Meier survival curve was meant for [0]. Please use it.

Also, it'd be great to see the confidence intervals on the annualised failure rates.

[0] https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator


Obviously yes. At an AFR of 1.5% they'd have to have the drives run for (about) 67 years to have them all retire from failure.

(in reality they'd probably have failure rates spike at some point, but the idea stands. And they explicitly said they retired a bunch of 4TBs)


Drives have warranties, after which the manufacturer doesn't make any claims about it's durability. This could put your fleet at wild and significant risk if things start hitting a wall and failing en masse. You may not be able to repair away if as you're repairing the data you're copying to yet another dying drive.

So you have usually a lifetime of drive tput and start/stop values you want to stay under, and depending on how accurate your data is for each drive you may push beyond the drive warranties. But you will generally stop before the drive actually fails.


They just retired 4TB ones.

While they seem to get retired, it's not as quick as we'd think.


> A Few Good Zeroes: In Q1 2024, three drive models had zero failures

They go on to list 3 Seagate models that share one common factor: Sharply lower drive counts. Backblaze had a lot fewer of these drives.

All of their <5 failures are from low quantity drives.

I have confidence in the rest of their report - but not with the inference that those 3 Seagate models are more reliable.


This uncertainty should be accounted for in the confidence intervals of their stats.


https://youtu.be/IgJ6YolLxYE

This video presents AFR, failure rates, derived from prior backblaze reports, aggregated.

Definitely worth a watch if you're interested in this report.


Looks like WDC reliability has improved a lot in the past decade.

Seagate continues to trail behind competitors.

I guess they're basically competing on price? Because with data like this, I don't know why anyone running data center would buy Seagate over WD?


The WDC models which are only somewhat more expensive than Toshiba or Seagate tend to perform quite a lot worse than those. Models with the same performance are significantly more expensive.


Can't thank backblaze enough...


I always click these every time they come up. Can't tell you how much I appreciate them releasing stats like this!


Why no Samsung?


And people will still say they dont trust Seagate because of the 3TB drives that failed over a decade ago.


Anecdata is such a weird thing. In my own NAS, I've had 3 out of 3 WD Red drives, each a different size, die in an identical manner well before their warranty expired over the last several years. SMART says everything is fine, but the drive's utilization creeps up to a constant 100% and its write IOPS decrease until the whole array is slow as frozen molasses. That's in a constantly comfortable operating environment that's never too hot, cold, or otherwise challenging. And yet it looks like I'm the statistical outlier. Other people -- like Backblaze here -- have decent luck with the same drives that have a 100% failure rate here.

Probability is a strange thing, yo. The odds of a specific person winning the lottery are effectively 0, but someone's going to. Looks like I've won the "WD means Waiting Death" sweepstakes.


Sounds like you're a victim of WD selling Reds with Shingled Magnetic Recording (SMR). Quite a scandal a few years ago.

SMR takes advantage of the fact read heads are often smaller than write heads, so it "shingles" the tracks to get better density. However, if you need to rewrite in between tracks that are full, you need to shuffle the data around so it can re-shingle the tracks. This means as your array gets full or even just fragmented, your drives can start to need to shuffle data all over the place to rewrite a random sector. This does hell to drives in an array, which a lot of controllers have no knowledge of this shingling behavior.

Shingled drives are OK when you're just constantly writing a stream of data and not going to do a lot of rewriting of data in betweeen. Think security cameras and database backups and what not. They're complete hell if you're doing lots of random files that get a lot of modifications.

https://www.servethehome.com/wd-red-smr-vs-cmr-tested-avoid-...


No, these were 100% CMR drives. I checked them very closely when the scandal broke and confirmed that mine were not shingled.


Huh, weird, because that's 100% the failure mode friends of mine who did have shingled drives experienced. Maybe your drives were shingled despite labeling suggesting otherwise, or maybe they had whatever potential different error you got without it being the SMR that killed the arrays in the end.

Either way it made me never want to use WD for drives in arrays and not trust their labeling anymore. "WD Red" drives lost all meaning to me; who knows what they're doing inside.


> Maybe your drives were shingled despite labeling suggesting otherwise

I'm not ruling that out. The whole debacle was so amazingly tonedeaf that I wouldn't be surprised if they did that behind the scenes. I wrote this at the time: https://honeypot.net/2020/04/15/staying-away-from.html


Indeed, and anecdata is weighted so heavily by our minds, even when we are aware of it and consciously look at the numbers. That's what evolution gives us though. The best brains at survival are the ones that learned from their observations, so we're battling our nature by trying to disregard that. I'll never buy another Seagate because of that one piece of shit I got :-D


I've had so many Seagate drives fail that I won't buy Seagate again.

If a brand sells bad drives, they should be aware of the reputational damage it causes. Otherwise there is no downside to selling bad drives.


If you buy drives based on there reports, make sure your drives are operating within the same environmental parameters or these stats may not apply




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: