Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Somewhat OT: I have been lately toying with the idea of removing a part of the cluster logic from the DB, and moving it the client's drivers instead. Basically, it would work something like this:

You have 3 database nodes. Your application requests write access. The DB driver opens an event loop and tries to connect to all three at once. As soon as it has two live connections, it starts a distributed transaction (I know, these are unreliable, but bear with me, and assume that they work for now. This is the same thing as used on the server anyways). The application completes the work, and it is committed to two servers, which is the majority. The third server will replicate the changes as soon as it is able, verifying that the majority of the servers agree that this is in fact a valid changeset.

I think that this approach, combined with a good locking system (e.g.: the client would declare all locks upfront), could result in a robust system. It also makes it easy to change where your system lies on the CAP triangle: just tune your driver to require a connection to all servers, not just the majority to make the system lean more towards consistency, and make downed nodes that are recovering refuse any connections until they are caught up.

Does anyone have any thoughts on this?



I pretty sure this has been tried (generalised to getting a write lock on more than N/2 of the servers) but I can't find a citation right now.


Any idea if there was any success with it?


I've found a reference to it in Jean Bacon's Concurrent Systems. She calls it "quorum assembly", but I can't find any other references with that name (slides on cl.cam.ac.uk are by her, or influenced by her work.) I'm afraid it is probably an idea that has been had many times.

The write quorum (number of nodes you must talk to to do a write) must be > n/2. The read quorum plus the write quorum must be > n (otherwise you can read from outside the write quorum).

So the n=3 case is simple, RQ=WQ=2 as you suggested. I think it wasn't very successful in larger cases because n/2 isn't much better than n, and you have to do some kind of synchronisation between the nodes which is tricky to get write (consider how the write nodes know the other write nodes have done the write.)

In practice, hierarchical schemes where you are reading from a slave which might be behind the master are more common. There you can have 1 write node and n read nodes. Since reads are generally more common than writes, this is great.


Thanks for the details and the explanation. Makes a lot of sense.


> the client would declare all locks upfront

But if the client crashes after acquiring the locks, wouldn't your system be deadlocked?


Breaking the connection would release the locks.


TCP doesn't "break" connections like that. Cleanly closing sockets breaks connections, but machines that crash or drop off the network won't be noticed until the connection times out, which is typically on the order of many minutes.


I realize that. My point is that existing locking systems work this way. For example, MySQL releases any table locks, rolls back the transaction and drops all temporary tables whenever the connection is closed or times out. This is no worse than what we already have.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: