Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That’s fascinating. I guess you probably can’t open source it, but I’d love to read a blog post about it.


Here's a version https://github.com/michaelgg/cidb -- Just some of the raw integer k-v storage part. It assumes you already have the hashed entries (you truncate them and the compression takes it from there). It is really what you should expect more from a college course IR project but since I never went to school... oh well.

I used this same library to encode telephone porting (LNP) instructions. That is a database of about 600M entries, mapping one phone number to another. With a bit of manipulation when creating the file, you go from 12GB+ naive encoding as strings (one client was using nearly 50GB after expanding it to a hashtable) to under a GB. Still better than any RMDBS can do and small enough to easily toss this in-RAM on every routing box.

Some day I'd like to write it in Rust and implement vectorized encoding and more compression schemes. Like an optimized SSTable just for integers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: