Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I had to roll my own transactional database for an embedded product working in a very constrained environment (2MHz ARM, 2MB total memory). I did very extensive search and I found no product with the combination of characteristics I was looking for:

- working in very small but more importantly, constant memory (predictability is a must for reliable embedded app),

- provide provable transactional characteristics with guarantee to roll back to previous state in case transaction is interrupted and still be consistent and able to accept new transactions,

- minimum storage overhead -- the amount of overhead was directly translating to competitiveness of the device at the time as multiple vendors tried to provide their own solutions for the same preexisting platform

- storage write pattern that would provide wear leveling for the underlying flash

I ended up writing my own database (less than 1k LOC total with encoding/decoding data using BER-TLV) that would meet all characteristics, take few tens of bytes of stack, take few bytes of overhead per write. The database would vacuum itself and coalesce record state automatically. It had some O(n^3) algorithms but THAT'S OK since the amount of data could never be so large that it could pose any problems.

The project took 2 years to complete. I spent maybe a month designing, implementing and perfecting the database. I wouldn't say that the Akin's law of spacecraft design applies here. I would probably spend more than that if I had to integrate existing product and end up with inferior product anyway.



The title is misleading - it's actually about how and why they did end up writing their own db. From the article:

> There’s an old adage in software engineering — “never write your own database”. So why did the OneNote team go ahead and write one as part of upgrading OneNote’s local cache implementation?


What I am really objecting to is those hard "rules". "Never optimize early". "Never roll your own database", etc.

All those rules work for most but not all projects. It's the same as saying "You shall allways obey traffic rules". Maybe I should, but sometimes I may not want to brake on yellow light when I have clearly impatient driver tailgating me.

As we gain experience we learn the world is not black and white. Akin's law #8 says:

"In nature, the optimum is almost always in the middle somewhere. Distrust assertions that the optimum is at an extreme point."


The article deliberately contradicts the title, it's literally stating the same thing you're saying but with a real world example (One Note) and why they needed to write their own.


I'm fine with those rules. Until you can understand why the rule exists (on a technical level), can you break it. Because anyone who knows all the edge cases, filesystem issues, contention issues, deadlocking and integrity guarantees with databases will search hard for an OOB solution that fits his requirements, and will consider rolling his own a last resort.


But your objection is rather silly!! Those rules are meant for beginners. If you're not a beginner you already know when/if/how you can break the rules. It is correct to teach kids that they should tell the truth in every scenario, even if as adults we don't.


Exactly. Fully understand why should never do x, so that fully understand when you actually have to do x.


values > principles > practices

You are free to violate any of those on the right in the interest of the ones to the left.


"It is correct to teach kids that they should tell the truth in every scenario, even if as adults we don't."

Not correct. Example, a stranger: "are you alone a home?" or "what is the keycode for the door" etc.


I don't agree that these are counterexamples; there's a difference between telling the truth and telling the truth someone else wants to hear.

“I am not going to answer that” is a true, correct, and appropriate response to both of those questions.


But to a potentiell hostile stranger it is better to lie, than to tell the truth. More extreme examples come to my mind, with active agressor wanting to know something, where it is smarter to fool the enemy, than to tell the truth. "I don't knoe the code instead of "I am no allowed to tell".

But besides, most kids learned to lie pretty well, to circumvent various restrictions - and learning from the adults who tell them no to lie, but do it by themselves.

I am in favor of leading by example. If I am lying, how can I possible demand it from kids, to do otherwise? When they find out, they loose confidence in me. But when I tell them to only lie in extreme situations to "enemys" and act accordingly, they are much more likely to become truthful by default as well.


I think its still correct. Sorry, I don't find your examples to be a reason to teach children that truth is optional.


It would be much better to teach children that the truth doesn't exist, that everything we call "reality" is a lie on some level, and that the best you can do is to stay honest to yourself and your intentions. Probably much harder to teach, but much closer to what they will fight with for the rest of their lives. I think it's the lack of such preparation in childhood which makes people give up, either never trying to figure out what the "truth" even is or relying on an imaginary friend in the sky as a source of one and only truth.

Some simplifications are useful in some contexts, but the world is an unimaginably complex thing and trying to dumb it down can only take you so far.


"the truth doesn't exist, that everything we call "reality" is a lie on some level"

Wait so does this truth exist, or is it also a lie on some level?


> does this truth

It's not truth at all. Just an observation. It happens to fit with my perceptions. That's it.

You can artificially define truth by tying it to a particular frame of reference, but that's not "the truth", as that frame of reference is not 100% transferable to others anyway. The idea of umwelt, as I understand it, seems to work here. Still, it's just an observation, an impression.

I'm not really saying anything new here. Descartes was saying something similar quite a long time ago. Then again, he could have meant something else entirely and there's no way for me to know which is it. I just assume my understanding is close enough to the intended meaning. I may well be completely wrong on this.

Basically, there's nothing you can be really sure about, including the fact that you can't be really sure about anything. There are only things that appear to work for you and, possibly, others. You can use them. Just don't believe them unquestioningly.


Never said that. Default is truth. But there are rare exceptions.


If you didn't then I don't even know what you're saying, or why you're disagreeing.

Me: It is correct to teach kids that they should tell the truth in every scenario, even if as adults we don't.

You: Not correct. [...]

Exceptions exist in every situation for every single thing you say or do or think. Pointing out exceptions doesn't get you anywhere. Every adult knows this. A childs mind doesn't, and depending on the maturity level, cannot comprehend them. There is an established successful method of instruction of starting with simple rules/laws/examples, and then layering complexity later on. All of our systems of education are based on this. Nobody tells a child to also look up for a possible meteor or a skyscraper collapsing, or a vulture about to attack them before crossing the street. We just teach them to look both ways. Programmers are taught to take for granted that a bit in memory retains its value once set. When you're learning how to program you don't need to account for CPU bugs or cosmic radiation.


"If you didn't then I don't even know what you're saying, or why you're disagreeing.

Me: It is correct to teach kids that they should tell the truth in every scenario, even if as adults we don't." teach children that truth is optional."

Optional means maybe. I said default is truth. And when I say every, I mean every. But there are exceptions, in the case of enemys. So if they lie to someone, it means this one is a (temporary) enemy. Which is a serious implication. They do understand that usually.

They also understand the concept of friend and enemy very early, I bet you agree. (not that they can allways correctly sort it out, but also we adults can't do that allways)

So they very intuitively understand the exception in the case of a meeting with a potential dangerous person and I bet, instinctivly act accordingly and not tell him, where others are hiding for example. (Or break down and cry, also a valid strategy)


Sorry, I don't agree with your comment. No point in going round and round. Goodluck!


But it is much more likely for a kid to be approached by strangers who ask them if they are alone, than be hit by a meteor. So I say it does make sense to explain them the concept of a complex world and different rules for different situations as early as they can get in situations, where they have to look for themselves.


"Never roll your own authentication" seems to hold up pretty well though...


“Never use absolutes.” - Brightball’s Law


"Never use absolutes." is absolute in itself


That's the joke.


What a smart observation


Exactly. Exceptions prove the rule. But this applies to everything, not just databases. Never write your own OS, unless you are faced with an exception where you have to.

If you have a special case where RDBMs can't fill your need, then you obviously have to build your own. But these cases are so rare that it proves the rule.


It's "funny" how you (and many others too) didn't read the article and just commented based on the title.


It's as if RTFA was a real thing.


Is this something publicly available? I am currently hunting around for an embedded database which has amazingly similar requirements, and have found nothing. I started my own db, but i'd love to be able to take a look at what you ended up with, if possible.


It is proprietary, unfortunately, but you can pm me to discuss it.

Once you understand the problem it isn't really difficult to implement it. The trick is to decide what kind of properties you really need and only implement what is absolutely necessary to achieve it.

I did this in ANSI C using some of the stuff I have already implemented for the project. For example, I already had BER-TLV parser/serializer with very specific properties (managing collections within buffer of specified size, etc.) so I reused it for the file format and then again on application layer for the record format.

The basic database is KV store kept in the form of a append-only log file. The entries are records of modifications performed to the database. The keys and the values are binary and the structure is managed by the application. Application supplies callback to perform some operations (for example, given a base version of the record and a patch, calculate the patched version of the record).

The transactions were basically an identifier and a flag (is the entry end of transaction?)

All algorithms are very simple and focused on constant use of memory.

For example: coalescing operation was basically reading old file and writing alive records to a new file. I would traverse the old file entirely, for each record (O(n^3)) and, using callbacks supplied by the application, write end result of all committed transactions to the new file. After this was done to all records I would switch the files.

Fetching any record meant doing linear search through all database entries to find all entries related to the record, taking base version and then successively applying all patches until entire database file is traversed. This is inefficient as hell but the use case was that the database was very rarely actually accessed. The typical use was to just write data to the database and it was only accessed after failure or in some very specific background tasks.


wonder if sqlite would work for something like that, if you can get away from sql, i.e., strings.


It can. Just make sure you "prepare ()" your statements. This turns them into a compiled set of VM instructions.


The issue being sqlite did not even remotely meet the criteria:

- it requires operating system,

- its memory footprint even in best case way larger than total memory available on the system,

- can't work in constant amount of memory (no dynamic allocations after application startup, compile-time verifiable stack usage)

- can't provably continue from any power cycle -- there is no guaranteed automated recovery method in case of unfortunate power cycle

- data storage overhead was unacceptable for the application


You are no doubt correct about points (2) and (5). But just to set the record straight, points (1), (3), and (4) are mistaken.

The standard build of SQLite uses an OS, but there is a compile-time option to omit the OS dependency. It then falls to the developer to implement about a dozen methods on an object that will read/write from whatever storage system is used by the device. People do this. We know it works. We once had a customer use SQLite as the filesystem on their tiny little machine.

Likewise, the use of malloc() is enabled by default but can be disabled at compile-time. Without malloc(), your application has to provide SQLite a chunk of memory to use at startup. But that is all the memory that SQLite will ever use, guaranteed. Internally, SQLite subdivides and allocates the big chunk of memory, but we have mathematical proof that this can be done without ever encountering a memory allocation error. (Details are too long for this reply, but are covered in the SQLite documentation.)

Finally, we do have proof that SQLite can continue after a power cycle - assuming certain semantics provided by the storage layer. Hence, the proof depends on your underlying hardware and those methods you write to access the hardware for (1) above. But assuming those all behave as advertised, SQLite is proof against data loss following an unexpected power cut. We have demonstrated this by both code analysis, and experimentally.

So probably you were correct to write your own database in this case. My point is that SQLite did not miss your requirements by quite as big a margin as you suppose. If you had had a bigger hardware budget (SQLite needs about 0.5MB of code space and a similar amount of RAM, though the more RAM you feed it the faster it runs) then you might have been able to save yourself about two years of development effort.


Thank you for your explanation.

SQLite is fantastic product, it is just not aimed at extremely constrained platform like the one I was working on.

The device was a platform the company has already invested into a lot (few thousand units, committed to buy another few tens of housands) and not cheap (few hundred dollars per unit). The application was bid to extend the lifetime of that hardware by cramming new features to existing platform. At the end, after considering many products, it was clear to us that every byte saved was byte available for other features and memory was our limiting factor. So it was not even a question whether we could fit a particular database but how much space we would be left with for the really important features.

I don't remember object (*.o) sizes, but the entire database was 1kLOC of ANSI C while the entire application was about 70k LOC of heavily optimized code. The memory requirements were few tens of bytes of stack (not really important as other parts of application were using more stack) and hundred bytes of statically allocated memory. It even came to dumbing down algorithms to keep object sizes down. I learned a lot on that project.

I admit I did not do much research on the provable characteristics of SQLite back then (decade ago) once it was clear it could never fit our application. The research was mostly aimed to prove we need our own database. The management did not agree ("Never roll your own database...") So I just ended up doing it as a skunk works project. It worked, the product is still in use and it has never failed a single transaction (out of tens o billions processed).

I may even write my own blog post in the spirit of the one in the title of this thread, it just never occured to me it is interesting to general public.


sounds cool, I would be interested in a blog post. databases for highly constrained systems sounds interesting.


That's a great and informative answer, only minor comment that I think OP said the project was 2 years but the database portion was 1 month of that 2 years.

Still I agree better not to roll your own stuff unless it is absolutely critical.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: