Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Note that that's a bug in the client, in the Zig-java FFI code, which is inherently unsafe. We'd likely made an a similar bug in Rust.

Which is, yeah, one of the bigger technical challenges for us --- we ship language-native libraries for Go,node,Java,C#,Python and Rust, and, like in the Tolstoi novel, each one is peculiar in its own way. What's worse, they aren't directly covered by our deterministic simulator. That's one of the major reasons why we invest in full-system simulation with jepsen, antithesis and vortex (https://tigerbeetle.com/blog/2025-02-13-a-descent-into-the-v...). We are also toying with the idea of generating _more_ of that code, so there's less room for human error. Maybe one day we'll even do fully native client (eg, pure Java, pure Go), but we are not there yet.

One super-specific in-progress thing is that, at the moment, the _bulk_ of the client testing is duplicated per client, and also the _bulk_ of the testing is example-based. Building simulator/workload is a lot of work, and duplicating it for each client is unreasonable. What we want to do here is to use multi-process architecture, where there's a single Zig process that generates the workloads and generates interesting sequences of commands for clients, and than in each client we implement just a tiny "interpreter" for workload language, getting a test suite for free. This is still WIP though!

Regarding the broader memory safety issue in the database. We did have a couple of memory safety bugs, which were caught early in testing. We did have one very bad aliasing bug, which would have been totally prevented by Rust, which slipped through the bulk of our testing and into the release (it was caught in testing _after_ it was introduced): https://github.com/tigerbeetle/tigerbeetle/pull/2774. Notably, while the bug was bad enough to completely mess up our internal data structure, it was immediately caught by an assert down the line, and downgraded from correctness issues to a small availability issues (just restarting the replica would fix it). Curiously, the root cause for that bug was that we over-complicated our code. Long before the actual bug we felt uneasy about the data structure in question, and thought about refactoring it away (which refactor is underway. Hilariously, it looks that just "removing" the thing without any other code changes improves performance!).

So, on balance, yeah, Rust would've prevented a small number of easy bugs, and on gnarly bug, but then the entire thing would have to look completely different, as the architecture of TigerBeetle not at all Rust-friendly. I'd be curious to see someone replicating single-thread io-uring no malloc after startup architecture in Rust! I personally don't know off the top of my head whether that would work or not.



I remember reading a similar thing about FoundationDB with their DST a while back. Over time, they surfaced relatively few bugs in the core server, but found a bunch in the client libraries because the clients were more complicated and were not run under their DST.

Anyways, really interesting report and project. I also like your youtube show - keep up the great work! :)


Oh, important clarification from andrewrk(https://lobste.rs/c/tf6jng), which I totally missed myself: this isn't actually a dereference of uninitialized pointer, it's a defer of a pointer which is explicitly set to a specific, invalid value.


This is indeed an important point, the way I originally understood the bug was that the memory was not initialized at all. Thanks for the clarification


well, per the zig spec, any program that relies on that "explicitly set [and] specific" value of 0xAA isn't valid, so it's absolutely a bug




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: