I, too, love Backblaze's reports. But they provide no information regarding drive endurance. While I became aware of this with SSDs, HDD manufacturers are reporting this too, usually as a warranty item, and with surprisingly lower numbers than I would have expected.
For example, in the Pro-sumer space, both WD's Red Pro and Gold HDDs report[1] their endurance limit as 550TB/year total bytes "transferred* to or from the drive hard drive", regardless of drive size.
The endurance figures for hard drives are probably derived from the rated number of seek operations for the heads, which is why it doesn't matter whether the operations are for reading or writing data. But that bakes in some assumptions about the mix of random vs sequential IO. And of course the figures are subject to de-rating when the company doesn't want the warranty to cover anything close to the real expected lifespan, especially for products further down the lineup.
Where do you buy your drives? Last time I was in the market, I couldn't find a reputable seller selling the exact models in the report. I'm afraid that the less reputable sellers (random 3rd party sellers on Amazon) are selling refurbished drives.
I ended up buying a similar sounding but not same model from CDW.
These are useful data points, but I've found that at my risk tolerance level, I get a lot more TB/$ buying refurbished drives. Amazon has a couple of sellers that specialize in server pulls from datacenters, even after 3 years of minimal use, the vendors provide 5 years of additional warranty to you.
> even after 3 years of minimal use, the vendors provide 5 years of additional warranty to you.
The Amazon refurb drives (in this class) typically come with 40k-43k hours of data center use. Generally they're well used for 4½-5yrs. Price is ~30% of new.
I think refurb DC drives have their place (replaceable data). I've bought them - but I followed other buyers' steps to maximize my odds.
I chose my model (of HGST) carefully, put it thru an intensive 24h test and check smart stats afterward.
As far as the 5yr warranty goes, it's from the seller and they don't all stick around for 5 years. But they are around for a while -> heavy test that drive after purchase.
Buying refurbished also makes it much easier to avoid having the same brand/model/batch/uptime, for firmware and hardware issues. I do carefully test for bad sectors and verify capacity, just in case.
I think you're better off buying used and using the savings for either mirroring or off-site backup. I'd take two mirrored used drives from different vendors over one new drive any day.
Indeed- RAID used to stand for Redundant Array of Inexpensive Disks. The point was to throw a bunch of disks together and with redundancy it didn't matter how unreliable they were. Using blingy drives w/ RAID feels counter-intuitive- at least as a hobbyist.
Refurbed drives have a MUCH HIGHER failure rate. I used to send back lots of drives to Seagate, they come back with the service sticker and that means trouble. YMMV
It's definitely scraped with a few simple queries and not moderated by a human, you have to manually check before buying of course. It just saves a few minutes of time automating the initial search.
I think there will eventually be a false advertising lawsuit or some regulatory action against Amazon about this. Until that happens, it’s hard to say for certain which items are used.
And for stuff like this, many companies will have an approved vendor, and you have to buy what they offer or go through a justification for an exception.
I guess it isn’t that surprising given the path the development took, but it is always funny to me that one of the most reputable consumer tech companies is a photography place.
Similar to how the most popular online retailer is a bookstore. Successful businesses are able to expand and I wish B&H the best of luck on that path, we need more companies like them.
B&H seems to be pretty focused on techy things (and cameras of all sorts have always been techy things, though that corner of the tech market that has been declining for a long time now).
When they branch out to selling everything including fresh vegetables, motor oil, and computing services, then maybe they might be more comparable to the overgrown bookstore.
There used to be a much more distinct camera—and all rhe ancillary gear and consumables than there used to be. Though B&H still sells a ton of lighting and audio gear as well as printers and consumables for same.
They sell other stuff too but they’re still pretty photo and video-centric, laptops notwithstanding.
> The 4TB Toshiba (model: MD04ABA400V) are not in the Q1 2024 Drive Stats tables. This was not an oversight. The last of these drives became a migration target early in Q1 and their data was securely transferred to pristine 16TB Toshiba drives.
That's a milestone. Imagine the racks that were eliminated
Yeah, but just thinking about it reminds me how annoyed I am that they increased the B2 pricing by 20% last year.
Right after launching B2, in late 2015, they made their post about storage pod 5.0, saying it "enabled" B2 at the $5/TB price, at 44 cents per gigabyte and a raw 45TB per rack unit.
In late 2022 they posted about supermicro servers costing 20 cents per gigabyte and fitting a raw 240TB per rack unit.
So as they migrate or get new data, that's 1/5 as many servers to manage, costing about half as much per TB.
It's hard to figure out how the profit margin wasn't much better, despite the various prices increases they surely had to deal with.
The free egress based on data stored was nice, but the change still stings.
Maybe I'm overlooking something but I'm not sure what it would be.
In contrast the price increases they've had for their unlimited backup product have always felt fine to me. Personal data keeps growing, and hard drive prices haven't been dropping fast. Easy enough. But B2 has always been per byte.
And don't think I'm being unfair and only blaming them because they release a lot of information. I saw hard drives go from 4TB to 16TB myself, and I would have done a similar analysis even if they were secretive.
> e hardware costs and the operation costs should all have dropped between 2x and 5x
That would work if they fully recouped the costs of obtaining and running the drives, including racks, PSUs, cases, drive and PSU replacements, control boards, datacenter/whatever costs, electricity, HVAC etc. and generated a solid profit not only to buy all the new hardware but a new yacht for the owners too.
But usually that is not how it works, because the nobody sane buys the hardware with the cash. And even if they have a new fancy 240TB/rack units, that doesn't mean they just migrated outright and threw the old ones ASAP.
So while there is a 5x lower costs per U for the new rack unit, it doesn't translate to 5x lower cost of storage for the sell.
I would sure hope the original units were recouped after 8 years.
You can look at their stats and see that the very vast majority of their data is on 12-16TB drives, and most of the rest is on 8TB drives. Even with those not being the very newest and cheapest models, their average server today is a lot denser and cheaper than their brand new servers 8 years ago.
I wonder how the pricing works out. I look at the failure rates and my general take away is "buy Western Digital" for my qty 1 purchases. But if you look within a category, say 14TB drives, they've purchased 4 times as many Toshiba drives as WD. Are the vendors pricing these such that it's worth a slightly higher failure rate to get the $/TB down?
If you are a large company owning hundreds of thousands of them and knowing you will have disk failures regardless, maybe. If you own just a few hundreds and a failure costs you money the logic may be completely different.
I'd assume so. Also consider that if a drive fails under warranty, and you're already dealing with a bunch of failing drives on a regular basis, the marginal cost to get a warranty replacement is close to zero.
How would they lead you astray? I wouldn't consider a drive failure in a home NAS to indicate that - even their most statistically reliable drives still require redundancy/backup - if you haven't experienced a drive failure yet, that's just chance.
Well.. that might be true for a lot of normal NAS with 8 drives or less.
I on the other hand have a 4U 48 bay Chenbro so drive failures are somewhat significant for me lol.
Redundancy wise it's 4 raidz2 vdevs with 12 drives each and backed up to rsync.net
I have had 2 drive failures, one was shortly after commissioning and the other happened a few months ago which was pretty random.
I'm using HGST drives, specifically 8TB He8 and they have been really solid in operation since 2016. I don't have any spares left now though so when I get back to where the chassis is hosted I will be doing a rebuild onto 16TB drives.
On the other hand in my professional life I experienced arrays that had multiple drives fail in quick succession (especially around 2010-2012 era) from less ... reliable brands cough Seagate cough.
So I would consider 2 failures from ~1.5M drive hours to be very good and thank Backblaze for convincing me to shell out on these rather more expensive drives.
1. 2 failures across 1.5M hours is something around a 1.2% AFR, which is good, but not significantly below Backblaze's average. Definitely better than the stats on their worst drives.
2. Assuming some premium for higher-reliability drives, and some required storage growth over time, the most efficient drives to buy are those that fail at exactly the rate that lets you replace them at with higher-capacity drives as needed for storage growth. I'm personally at the point where I'm decommissioning 2TB drives with 100k hours; I'd be better off having saved some money and having the drives fail now.
ever since backblaze started doing these there has been a dedicated set of seagate fanboys (I know, it’s the oddest thing to fanboy over) who come up with literally any excuse to avoid acknowledging that seagate might have higher than normal failure rates, and that has included throwing shade at “well, you don’t know the failure rate of those Toshibas and WDs in home usage!!!”.
They are kingkong. After they started publishing these Seagate seemingly stopped selling trash less and less. Had so many Seagate drives going south. Bleh. Would be nice to see SSD drive stats too. There are so many terrible SSDs out there, like SP, which has utter trash controllers. One day your drive gets locked up without any forewarning, and your data just disappears.
I find the stats interesting, but it's hard to actually inform any decisions because by the time the stats come out, who knows what's actually shipping.
I can't think of any reason why the lifetime would be any different for a refurb. of course, you need to start from when the drive was originally used. of course, there is probably also some additional wear and tear just due to the removal, handling, and additional shipping of the drives.
In some ways that would be incredibly noisy to test. However it could be a good way to measure the practicality of S.M.A.R.T metrics. Finding out how accurate they are at predicting hdd lifespan would be a great finding.
In my experience, the drives report "healthy" until they fail, then they report "failed"
I've personally never tracked the detailed metrics to see if anything is predictive of impending failure, but I've never seen the overall status be anything but "healthy" unless the drive had already failed.
I've had several hard drives that started gradually increasing a reallocated sector count, then start getting reported uncorrectable errors, then eventually just give up the ghost. Usually whenever reallocated sectors starts climbing a drive is nearing death and should be replaced as soon as possible. You might not have had corruption yet, but its coming. Once you get URE's you've lost some data.
However, one time a drive got a burst of reallocated sectors, it stabilized, then didn't have any problems for a long time. Eventually it wouldn't power on years later.
Absolutely. I've looked at the SMART data of easily over 1000 drives. Many of them ok, many of them with questionable health, many failing and many failed. The SMART data has always been a valuable indicator as to what's going on. You need to look at the actual values given by tools like smartctl or CrystalDiskInfo. Everything you need to evaluate the state of your drives is there.
I've never seen an HDD fail overnight without any indication at all.
I've had an M.2 NVMe drive start reporting bad blocks via SMART. I kept using it for non-critical storage, but replaced it as my boot drive. Obviously not the same failure pattern as spinning rust, but I was glad for the early warning anyway.
I'd expect so given that HDDs are still having significant density advancements. After a while old drives aren't worth the power and sled/rack space that could be used for a higher capacity drive. And, yeah, it makes these statistics make more sense together.
Edit: plus they are just increasing drive count so most drives haven't hit the time when they would fail or be retired...
One can watch both SMART indicators as well as certain ZFS stats and catch a problem drive before it actually fails.
I like to remove drives from zpools early because there is a common intermediate state they can fall into where they have not failed out but dramatically impact ZFS performance as they timeout/retry certain operations thousands and thousands of times.
Yes, and because of that the numbers on the average time to failure are completely meaningless. The drives the don't ever fail skew the numbers completely. If a fantastically reliable drive were to have 5/5000 drives fail, but they all failed in the first month and then the rest carried on forever, then that would show here as having a lower "reliability" than a dire drive where 4000/5000 drives fail after a year.
I'd like to see instead something like mean time until 2% of the drives fail. That'd actually be comparable between drives. And yes, it would also mean that some drive types haven't reached 2% failure yet, so they'd be shown as ">X months".
This is what a Kaplan-Meier survival curve was meant for [0]. Please use it.
Also, it'd be great to see the confidence intervals on the annualised failure rates.
Drives have warranties, after which the manufacturer doesn't make any claims about it's durability. This could put your fleet at wild and significant risk if things start hitting a wall and failing en masse. You may not be able to repair away if as you're repairing the data you're copying to yet another dying drive.
So you have usually a lifetime of drive tput and start/stop values you want to stay under, and depending on how accurate your data is for each drive you may push beyond the drive warranties. But you will generally stop before the drive actually fails.
The WDC models which are only somewhat more expensive than Toshiba or Seagate tend to perform quite a lot worse than those. Models with the same performance are significantly more expensive.
Anecdata is such a weird thing. In my own NAS, I've had 3 out of 3 WD Red drives, each a different size, die in an identical manner well before their warranty expired over the last several years. SMART says everything is fine, but the drive's utilization creeps up to a constant 100% and its write IOPS decrease until the whole array is slow as frozen molasses. That's in a constantly comfortable operating environment that's never too hot, cold, or otherwise challenging. And yet it looks like I'm the statistical outlier. Other people -- like Backblaze here -- have decent luck with the same drives that have a 100% failure rate here.
Probability is a strange thing, yo. The odds of a specific person winning the lottery are effectively 0, but someone's going to. Looks like I've won the "WD means Waiting Death" sweepstakes.
Sounds like you're a victim of WD selling Reds with Shingled Magnetic Recording (SMR). Quite a scandal a few years ago.
SMR takes advantage of the fact read heads are often smaller than write heads, so it "shingles" the tracks to get better density. However, if you need to rewrite in between tracks that are full, you need to shuffle the data around so it can re-shingle the tracks. This means as your array gets full or even just fragmented, your drives can start to need to shuffle data all over the place to rewrite a random sector. This does hell to drives in an array, which a lot of controllers have no knowledge of this shingling behavior.
Shingled drives are OK when you're just constantly writing a stream of data and not going to do a lot of rewriting of data in betweeen. Think security cameras and database backups and what not. They're complete hell if you're doing lots of random files that get a lot of modifications.
Huh, weird, because that's 100% the failure mode friends of mine who did have shingled drives experienced. Maybe your drives were shingled despite labeling suggesting otherwise, or maybe they had whatever potential different error you got without it being the SMR that killed the arrays in the end.
Either way it made me never want to use WD for drives in arrays and not trust their labeling anymore. "WD Red" drives lost all meaning to me; who knows what they're doing inside.
Indeed, and anecdata is weighted so heavily by our minds, even when we are aware of it and consciously look at the numbers. That's what evolution gives us though. The best brains at survival are the ones that learned from their observations, so we're battling our nature by trying to disregard that. I'll never buy another Seagate because of that one piece of shit I got :-D
For example, in the Pro-sumer space, both WD's Red Pro and Gold HDDs report[1] their endurance limit as 550TB/year total bytes "transferred* to or from the drive hard drive", regardless of drive size.
[1] See Specifications, and especially their footnote 1 at the bottom of the page: https://www.westerndigital.com/products/internal-drives/wd-r...