Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh there are all kinds of reasons drives can cause errors, and you have the bathtub curve. So there's lots to take into account when designing your pool.

But the article is using the spec sheet URE rate which I'd assume looks only at the drive and doesn't take into account problems with the computer around the drive or the EOL time after the drive warranty has expired, I'd assume it was the "baseline" error rate.

> Are you tracking the device errors, or only those that are visible to the OS?

If we're talking URE like the article, that's data-loss on a disk, and the OS would always figure it out, since it would cause a ZFS checksum failure on scrub.

In this case it's not my data and not my money, so my preference is 6-drive RAIDZ2 vdevs. We've only had one disk with errors (and that one was migrated from a PC where Windows never reported any errors... of course...). The oldest 2 disks (3.5 years power-on time) have single-digit reallocated sectors in SMART so those are on course to be replaced.

I'm just curious since the argument in the article doesn't add up in my eyes.

> Within one month I had 11 of 32 disks die, and barely managed not to lose any user files, and this was not during the 1st month in production

Wow, that is some terrible luck!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: