I got tired of the pricing and/or complexity of running message queues/event brokers, so decided to play around with implementing my own. It utilizes S3 as the source of truth, which makes it orders of magnitude easier to manage and cheaper to run. There's an ongoing blog series on the implementation: https://github.com/micvbang/simple-event-broker
I've had many use cases for using an event broker, but never found one that was simple enough that I would venture into hosting it myself, or cheap enough to rent/host that it was feasible. Once I realized that cloud object stores fit this problem perfectly (they provide durability and are cheap to use), I realized that it would be possible to write one myself. I wrote a post on it here, along with a tiny performance evaluation: https://blog.vbang.dk/2024/05/26/seb/
I spent the better part of a week working on it full time, but spread over months. I use it daily - it's serving the needs of multiple projects that I needed it for :)
I've been on teams where we've done this (very successfully in my opinion!) by creating helper code that automates creating a separate Postgres schema for each test, running all migrations, then running your test function before tearing it all down again. This all runs on CI/CD and developer machines, no credentials to any actual environments.
A major benefit of doing separate schemas for each test is that you can run them in parallel. In my experience, unless you have a metric ton of migrations to run for each test, the fact that your database tests can now run in parallel makes up (by a lot!) for the time you have to spend running the migrations for each test.
EDIT: usually we also make utilities to generate entities with random values, so that it's easy to make a test that e.g. tests that when you search for 5 entities among a set of 50, you only get the 5 that you know happen to match the search criteria.
Running all migrations before every tests can take you a surprisingly long way.
Once that gets a bit too slow, running migrations once before every suite and then deleting all data before each test works really well. It's pretty easy to make the deleting dynamic by querying the names of all tables and constructing one statement to clear the data, which avoids referential integrity issues. Surprisingly, `TRUNCATE` is measurably slower than `DELETE FROM`.
Another nice touch is that turning off `fsync` in postgres makes it noticeably faster, while maintaining all transactional semantics.
Yeah, it's currently a one-process-army so consensus isn't a problem :D
With the current implementation you /could/ run multiple readers at the same time; the only state there is, is files in S3. But it's a feature that just kinda happens to fall out of the current implementation, and not something it was designed for :)
My strategy was to start out by implementing the underlying storage primitives first, and then look into which transport to implement later. The transport of course can have a large impact on the required storage primitives, but in my case I built it the other way around since I knew what primitives I would need in my applications.
I've been playing with the thought of implementing (parts of) the Kafka API, but I honestly haven't considered the transport that much yet :)
Reading, 'ensuring that data is actually written and stays written is rather difficult', immediately reminded me of https://github.com/microsoft/FASTER (its not written in Go though), which is basically dealing with just that outlet ( except I think the KV store might be ram heavy, been a bit since I last looked at it )