There is no listed equivalent of RecordIO. What do people use for high-reliabili...

shereadsthenews · on April 10, 2019

That appears to be exactly RecordIO. It even has transposition. Is there a reason that doesn't meet your requirements?

Edit: it even includes an open source implementation of Cord, which they've renamed to Chain for some reason.

romka2 · on April 10, 2019

> That appears to be exactly RecordIO.

I suppose you mean "exactly" in a figurative way. Riegeli is definitely inspired by RecordIO and is meant as a successor to it but it's not RecordIO.

> Is there a reason that doesn't meet your requirements?

I need to store timeseries with fast lookup by timestamp. Riegeli doesn't support this out of the box. If I had discovered it before I built ChunkIO, I probably would've pulled the low-level code out of it and added timeseries support on top. Or maybe not. Reliability is very important to me and it's risky to use work-in-progress software that may or may not have any production footprint (I'm no longer with Google so I don't know if they use it internally.)

shereadsthenews · on April 10, 2019

I don't understand. RecordIO doesn't support lookup of any kind; it is a linear format. The interface of Riegeli looks to me exactly like the interface to RecordIO. All they've done is removed support for Google's abstract File* storage interface so it can be used by the public.

What you are describing sounds like SSTable. Perhaps you could benefit from LevelDB.

https://www.igvita.com/2012/02/06/sstable-and-log-structured...

romka2 · on April 10, 2019

RecordIO supports a form of random access lookup, although it's rarely used. Riegeli supports random access lookups as a first-class operation.

mempko · on April 10, 2019

Would Riak's Bitcask format fit the bill here?

https://riak.com/assets/bitcask-intro.pdf

romka2 · on April 10, 2019

This format looks somewhat underpowered. If one record is corrupted, there is no way to read anything after it. For the same reason there is no lookup/sharding support, such as finding the first record that starts in the second half of the file. If a writer crashes, a new instance of writer cannot append to an existing file without reading its whole content and truncating on the last readable record.