Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
NAND Flash: How It Breaks (cushychicken.github.io)
39 points by cushychicken on Aug 6, 2015 | hide | past | favorite | 7 comments


NAND Flash is cheap - in fact, in terms of cost per bit, it is one of the cheapest memory technologies on the market.

What I find most unfortunate is that almost everyone somehow gets so focused on the capacity of their nonvolatile storage devices that they don't think about longevity at all, and the industry feeds into this by advertising bigger, cheaper memories while downplaying the disadvantages. Meanwhile the "good stuff" is priced much higher than it should be. For example, multilevel cell technology only multiplicatively increases capacity, but exponentially decreases endurance and retention. This fact seems seldom-mentioned. 16Gbit SLC and 32Gbit 4-level (2-bit) MLC made on the same process should be the same area and price, but as of this post DRAMExchange says the SLC costs over 4x more.

I know ECC and advanced wear-leveling can help, but all this adds extra complexity to the system and introduces more points of failure (see all the SSD firmware problems for example.)


You could almost think of it as the MLC part being discounted for it's lesser reliability. There are a few good systems for counteracting that lesser reliability, but fundamentally, you're right - it's just another subset of failure points you're introducing.

There are some interesting options for counteracting the higher MLC failure rates - Yu Cai and the VLSI Design Group at Carnegie Mellon have produced some pretty interesting work hashing out the problems with MLC Flash and how to counteract them. For example: the in-place reprogram they've suggested would really help counteract retention errors in MLC, which are the most frequent kinds of errors in that medium.


Or the opposite, the SLC carrying a price premium because it's bought by parties that need the extra reliability and thus have a higher budget.


This is a great post, and really enhances my otherwise-rudimentary knowledge of the failure physics of NAND flash. Program disturb, in particular, is something that I didn't really have a good understanding of.

It looks like the author submitted this; if you're reading, I wrote an article a while ago [1] with some of my reverse-engineering of a NAND flash part, and I'd be interested to see which parts you think are on point and which parts seem totally wrong. I'm looking forward to your next article about device management!

[1] http://www.joshuawise.com/projects/ndfslave -- HN discussion: https://news.ycombinator.com/item?id=8133450


Hey Josh! I saw your original post, and I was fucking blown away by it - very nice work. It's been a while since I read it, but I seem to recall that you had the bulk of it right. I'm planning on writing another post after this one talking about some basic device management methods, and then work my way up to how they fit into embedded Flash filesystems. One of the things I'd like to do as part of this is write up the BCH error correction algorithm that's commonly implemented in NAND to check for bad bits - it's only slightly more complicated than the row/column parity algorithm you mentioned in your post.

Very glad you liked the post, and flattered that you got something out of it. I saw on your site that you're a CMU graduate - did you happen to take any classes or do any research with Dr. Yu Cai? Never met him myself, but he's written some great papers on NAND Flash device physics that I've referred to frequently in studying NAND.


Remember self healing NAND Flash with embedded heaters? What happened to that? Too reliable for todays planned obsolescence?

http://www.extremetech.com/computing/142096-self-healing-sel...


Interesting! My understanding of Flash (and semiconductor devices in general) suggests that higher temperatures mean higher electron energies - in the case of NAND, that means higher leakage currents from the floating gates to other bodies in the device.

I'll have to dig more on this - thanks for sharing!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: