Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: NebulaDB, my first attempt at a database (github.com/incrediblesound)
30 points by jhedwards on April 18, 2015 | hide | past | favorite | 35 comments


How Does it Work?

NebulaDB manages a c file that is full of structs that have arrays of pointers to links. When you save a query new structs are added to the file along with c functions that configure the links. When you query the data, the function that corresponds to your query is appended to the c file, the file is compiled and the output is parsed into JSON downstream.

Seriously?


Probably just a fun project. Don't get your panties in a wad.


I don't. But I interpret posting something here as a request for comments and this was the obvious one for me because there is no clear indication that this is just a fun project.

The only thing I sometimes worry about - and this is a quite good example - is if it is really a good idea to have all the weekend fun projects in the public. You want to know if there are some interesting new databases or libraries for something? Search for it and you get an ever increasing list of results with 99 % useless weekend hacks. I am not sure how it will turn out in the long run but there might be some real danger of polluting the internet with evermore useless stuff and bad code.


You think the Internet isn't already 99% useless garbage? Sifting through it is the entire point of intelligent search engines.


Of course, it is. But when it comes to source code it is a pretty recent development that everything ends up in the public.


Maybe someone should setup a database linking Key: <type of library> with Value: <list of libraries that do some thing>, sorted by rank voted on by the community.


This isn't exactly what you had in mind, but if you follow enough of the links...

https://github.com/asciimoo/ListOfGithubLists


Hey that's great! Bookmarked (and GH starred!)


I highly doubt people will have trouble distinguishing my project from production-ready databases.


Of course not and I didn't want to imply that. But I tried to survey the database landscape a year or so ago and a lot of the effort went into figuring out whether a system deserves a closer look or is an abandoned weekend hack. But there is obviously no obligation for anyone to make my work easy.

And the comment was not specifically aimed at your project, but is a very general thought about potentially negative effects of having an ever increasing amount of code in the public. Similar to the situation for satellites and space junk.


Sorry about all of that free labour and code being offered to you. We'll all try to tone down the creativity so you don't have to think critically in the future


Is that your idea of a "comment"?

"Seriously?"

Perhaps going beyond the first word and explaining your thoughts would have been worth more of everyone's time?


See my other comment [1], I don't think I could have done it much better.

[1] https://news.ycombinator.com/item?id=9403780


OK I guess I see your point. I understand that you dislike seeing an experiment like this walking around like a real database, but on my end I don't see the downside of trying something that's radically different from the established dogma. I'll probably read that book you recommended and try something more orthodox next time for good measure, but in the meantime I'm still really interested in exploring the limitations of this design.


Here are some things you will probably notice. Start with a really small database, maybe 10,000 facts. How long does a query take? Half a second? One second? I am just guessing here. It will take less than a millisecond on a usual SQL database.

Now increase the size of the database to say 100,000 facts. This will make essentially no difference for a normal database because they will likely be able to answer the query using an index in time logarithmic in the number of facts but it will take NebulaDB ten times longer to answer a query because GCC has to compile a ten times larger file.

If there is some quadratic code somewhere in the system it may actually take a hundred times longer. Every query requires processing all the data while a usual database will only touch the data relevant to answer the query.

Another thing are updates and deletions. I did not read the code in enough detail but can you support that? If yes it sounds a bit like a nightmare to me to find and remove or update the relevant lines in the C source code file. And it will be very likely a slow process because if the length of a fact changes you will have to move all the later facts.


So I timed the compile step and it gets quite long at one second for 10,000 facts, about a 1/10th of a second per 1000 entries. I wasn't expecting my first attempt at a database to be able to handle 100,000 facts or to be able to compare with SQL, but I did think GCC would be a little bit faster than that. Interestingly, when I created arbitrary relations between the data and queried them, the queries themselves executed in 5ms no matter how large the data set. I think I might convert nebula back into a graph-based logic programming language and try to make a triple-store without a compile step.


Just to put this into perspective, 100,000 facts is a tiny amount for a databases. If you use it for a blog having 250 users with 20 articles per user and 20 comments per article you have already reached 100,000 comments. If you add the date of and the user writing the comment you are at 300,000 facts. And we haven't yet stored any information about the users or articles.


"Seriously" is hardly a constructive comment


While it has a negative connotation it was a serious question because it was not obvious to me whether this is a serious project or not. But besides that what should a constructive comment to this look like? The design is just bad and I really hate watering down opinions. I prefer people telling me that I am the worst singer in the world a thousand times over ending up at American Idol with illusion that I am a good singer. Be as constructive as possible but don't let this make you dishonest.


I would have preferred you said something like "hey nice try, but the overhead of compiling a c file every time you query will quickly make the database unusable" or "this design limits the database to very small data sets" or anything, really, that could help me understand what is so bad about the design. It was an experiment, it seemed to work well, so I posted it here for feedback. I wouldn't mind if someone told be it was terrible as long they did it respectfully and gave me some clue as to why they thought so.


Let me try to show you my point of view. I see a link on Hacker News about a new database. Now I think there is someone who is serious about developing a database and is confident enough about his work to present it to the public and to this forum in particular. I naturally expect that author has a good understanding of database systems because I think he will have researched the topic before he started building something new. And there is really a lot of literature on databases.

The dominant standard, the relational model now often more or less synonymous with SQL, is 45 years old, the first »modern« database systems are more than 50 years old like IDS [1] released in 1964. But even those are usually referred to as 3rd generation systems with a decade or so of development on their back. And if you want to you could go even further back, at least as far as 1884, 130 years, and the Hollerith machines which are at the roots of IBM.

And with this expectation I looked at your project and found something that is probably the most convoluted ways to implement an database that I have seen. Really, don't take it personal, but there is not much that resembles a usual database system design and there is nothing where I honestly could have said that it is a nice idea but it could be improved by doing this or that. I could have suggested to read the Wikipedia article on databases [2] and work from there, but would this really have sounded less rude?

[1] http://en.wikipedia.org/wiki/Integrated_Data_Store

[2] http://en.wikipedia.org/wiki/Database


That doesn't sound very efficient. OP, what is the rationale behind this?


If OP just made it for fun, it doesn't have to be efficient


Sure, but it's such an odd design choice that one can't help but wonder why, even if it was for fun only.


People come up with all kinds of "odd choices" when they haven't been indoctrinated with existing dogma (I don't necessarily mean that in a negative way, despite the connotations). Sometimes that's how new breakthroughs happen, by bringing fresh eyes to bear on an old problem.

Whether sharing this particular example here was brave or stupid is up for grabs though... ;) (No offense to the OP intended. I think it's an interesting example of thinking outside the box, and I'm sure you're learning a lot more than you would going through a bunch of online tutorials!)


Well, I've only been programming for two years and I'm new to advanced concepts like compiling etc. There aren't really simple tutorials on how to make a database afaik, so I figured I'd just think something up and try it out. I realized it was an inefficient design, but I didn't know if it would be unusably inefficient or just slow compared to programs written by teams of CS majors. Only one way to find out, as they say.


There are a lot of books on database system design like »Fundamentals of Database Systems« [1] by Elmasri and Navathe. The reason there is no such thing as a simple tutorial is because any serious attempt to build a database system will consume on the order of tens or hundreds of men-years of effort if you start from scratch.

[1] http://www.amazon.com/Fundamentals-Database-Systems-Ramez-El...


> Well, I've only been programming for two years and I'm new to advanced concepts like compiling etc.

What exactly have you been doing for two years if you think compiling is an "advanced concept"?

By the way, your functions go in the .c file, not in the .h file.


Weird, most programmers I know consider compilers to be at least somewhat advanced. To answer your question, two years ago I knew nothing about programming, I was poor, living in China, with no marketable skills. I had to teach myself everything in my spare time. In that time I mostly studied frameworks, algorithms, data structures, the quirks of JS etc. I managed to do well enough that I'm now working 9-5 in silicon valley making UI for medical software. Unfortunately, I have never had the opportunity to learn about compilers or database internals.


I guess he referred to using a compiler, not writing one. The later is obviously one of the more advanced topics in computer science.


Oh how I love it to fetch a package via NPM, thinking it's gonna be JS-only and then being greeted by compiler errors because I don't have a C compiler setup on my Windows machine. Without falling on my face or reading the very end of the README, I would not know that I need C to run this Node package. Especially since the README says

> NebulaDB runs on a Node server.

I wish there was a way to declare non-Node dependencies in the package.json.


Ignore the negativity in this thread, OP. The rough design you've come up with is called a "triple store". You might find it interesting to learn about how powerful the representation is. Start here: http://en.m.wikipedia.org/wiki/Triplestore


Your 'syntax' is very hard to read:

['* ','->','admin']

Couldn't you write a little parser to understand this:

" * -> admin"

I doubt the overhead would be much.

[edit] a * is hard to get into a HN comment


Good idea, I think I'll do that. Thanks!


It looks like a "datalog database"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: