i think the joyent-manatee HA [https://github.com/joyent/manatee] solution is the safest way of taking a system like PG that supports synchronous replication and giving it good availability properties while preserving durability.
they basically have 3 nodes.
- the master which must synchronously commit all writes to the slave
- the slave which accepts synchronous writes from the master and may be promoted to the master
- the asynchronous slave that accepts writes asynchronously from the slave [or from the master if you support rolling back the timeline]
if the master dies, the the slave can promote itself to the master if it gains the support of the asynchronous slave.
if the slave dies, then the master can no longer commit and it tries to promote the asynchronous slave to the new slave. if there is some confusion over who is dead the asynchronous slave basically adjudicates because it can only be promoted once to slave either by the master or the original slave.
basically writes are only accepted if at least 2 nodes acknowledge the write and the system can survive one node dying. but because you need co-operation of 2 of 3 nodes in order to actually commit data in the system it doesn't have the split-brain issues you have with an external system checking for health and then telling nodes to stop connecting to the unhealthy master.
i think the actual state machine used is a bit more complicated because they need to support removing/adding nodes and i think they end up requiring zookeeper in order to coordinate.
though, i can see why maybe this kind of setup might not perform as well as one where you are accepting writes after committing 2 out of N to the read replicas. especially when you already need lots of read replicas.
they basically have 3 nodes.
- the master which must synchronously commit all writes to the slave
- the slave which accepts synchronous writes from the master and may be promoted to the master
- the asynchronous slave that accepts writes asynchronously from the slave [or from the master if you support rolling back the timeline]
if the master dies, the the slave can promote itself to the master if it gains the support of the asynchronous slave.
if the slave dies, then the master can no longer commit and it tries to promote the asynchronous slave to the new slave. if there is some confusion over who is dead the asynchronous slave basically adjudicates because it can only be promoted once to slave either by the master or the original slave.
basically writes are only accepted if at least 2 nodes acknowledge the write and the system can survive one node dying. but because you need co-operation of 2 of 3 nodes in order to actually commit data in the system it doesn't have the split-brain issues you have with an external system checking for health and then telling nodes to stop connecting to the unhealthy master.
i think the actual state machine used is a bit more complicated because they need to support removing/adding nodes and i think they end up requiring zookeeper in order to coordinate.
flynn [https://flynn.io/docs/databases] also have a HA solution for postgres/mysql based on this.
though, i can see why maybe this kind of setup might not perform as well as one where you are accepting writes after committing 2 out of N to the read replicas. especially when you already need lots of read replicas.