Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I've been thinking about some approaches to parallelize decompression of a single stream, it's not easy

You saw this, right?

https://news.ycombinator.com/item?id=35915285



I consider using an index to be "cheating" - or rather, my intended use-case is decompression of a stream that you've never seen before, which was generated by a "dumb" compressor.

That said, the approach I intend to take is similar. The idea is that one thread is dedicated to "looking ahead", parsing as fast as it can (or even jumping far ahead and using heuristics to re-sync the parse state. There will be false-positives but you can verify them later), building an index but not actually doing decompression, while secondary threads are spawned to do decompression from the identified block start points. The hard part is dealing with missing LZ references to data that hasn't yet been decompressed. Worst-case performance will be abysmal, but I think on most real-world data, you'll be able to beat a serial decompressor if you can throw enough threads at it.


There also is this: https://github.com/mxmlnkn/pragzip I did some benchmarks on some really beefy machines with 128 cores and was able to reach almost 20 GB/s decompression bandwidth.


Interesting. It looks like https://github.com/zrajna/zindex became public about a year after my searches for parallel uncompression came up empty and I started hacking on pigz.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: