Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is no listed equivalent of RecordIO. What do people use for high-reliability journals?

When I needed something like RecordIO to store market data, I couldn't find anything. So I implemented https://github.com/romkatv/ChunkIO. I later learned of https://github.com/google/riegeli (work in progress), which could've saved me a lot of time if only I found it earlier. I think my ChunkIO is a better though.



That appears to be exactly RecordIO. It even has transposition. Is there a reason that doesn't meet your requirements?

Edit: it even includes an open source implementation of Cord, which they've renamed to Chain for some reason.


> That appears to be exactly RecordIO.

I suppose you mean "exactly" in a figurative way. Riegeli is definitely inspired by RecordIO and is meant as a successor to it but it's not RecordIO.

> Is there a reason that doesn't meet your requirements?

I need to store timeseries with fast lookup by timestamp. Riegeli doesn't support this out of the box. If I had discovered it before I built ChunkIO, I probably would've pulled the low-level code out of it and added timeseries support on top. Or maybe not. Reliability is very important to me and it's risky to use work-in-progress software that may or may not have any production footprint (I'm no longer with Google so I don't know if they use it internally.)


I don't understand. RecordIO doesn't support lookup of any kind; it is a linear format. The interface of Riegeli looks to me exactly like the interface to RecordIO. All they've done is removed support for Google's abstract File* storage interface so it can be used by the public.

What you are describing sounds like SSTable. Perhaps you could benefit from LevelDB.

https://www.igvita.com/2012/02/06/sstable-and-log-structured...


RecordIO supports a form of random access lookup, although it's rarely used. Riegeli supports random access lookups as a first-class operation.


Would Riak's Bitcask format fit the bill here?

https://riak.com/assets/bitcask-intro.pdf


This format looks somewhat underpowered. If one record is corrupted, there is no way to read anything after it. For the same reason there is no lookup/sharding support, such as finding the first record that starts in the second half of the file. If a writer crashes, a new instance of writer cannot append to an existing file without reading its whole content and truncating on the last readable record.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: