Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I’ve found that simply converting a standard, normalized PostgreSQL database doesn’t work well in practice. Instead, you need to design your database with YugabyteDB’s distributed architecture in mind.

I wanted to reply to your previous posting but ycombinator does not allow for comments on older posts.

To put it simply, we spend almost 3 weeks trying out YB for your own usage and ran into massive performance issues.

The main issue seem to be that anything sharded (what it is by default), will perform sort/filter/and other operations only after retrieving sufficient data from those shards.

The results is that this massive impacts data. The YB teams somewhat hides this by spending mass amounts of resources on Seq Scans, or require you to explicite design Index's pre-sorted.

In our experience, a basic 3 node YB cluster barely does 7000 inserts/s, where as Pg instance does easily 100.000+ insert/second (with replication). Of course there is the cheating. Like "look how we insert 1 million / second" (3x), with 100 heavy AWS nodes. Ignoring that they are just inserting rows with no generated IDs ;) If you put that in front of pq, your easily hitting 150.000 insert / second on a single node.

There are issue like trigrams taking up 8x the space compared to pg trigrams (no row level packing). Materialized joins that are pre-sorted, are not respected as inserts are done by hash distribution. So item 1 can be on node 3, item 2 on node 1, ... what needs to be fetched and resorted, killing the performance benefits.

A optimized materialized join pulls out data between 0.2ms to 10ms, where we have seen YB doing between 6ms to minutes. On the exact same dataset, inserted the same way, ... on only 100 million rows of data. What you expect to be those databases their playgrounds.

Plugs that are broken in the new 2.25 version.

We lost multiple times t-master and t-server nodes. Ironically, the loss of the t-master was not even reported in the main UI. What can only be described as amateuristisch.

CockroachDB is another series of similar issue with latency, insert speed, and more ... combined with the new horrible "we spy on you free license, that we may or may not extend every year." We found CRDB more mature in a sense and less a Frankenstein's monster of parts, but lacking in postgres features (or working differently).

In essence, as long as you use them as mongoDB like denormalized databases that can run SQL, great. The issues really start when your combining normalization and expecting performance.

And the resources that both use means you need to scale to a 20x cluster, just to reach the equivalent of a single pq node. And each of those YB/CRDB nodes, need to have twice the resources then the pq node.

In general, my advice is to just run pq, replicate/patroni, maybe scale to more pq clusters where you separate tables to different clusters. Use the build in postgres_fdw to create materialized read nodes to offload pressure and load balance. Unless you are running reddit at scale, the benefits of YB/CRDB totally outway the tons of disadvantages.

The uptime, easy of upgrading, and geo-distribution is handy, not going to lie. But its software that really only gets benefits for companies that reach a special demand or high scale and even then. I remember that reddit ran (still runs?) on barely a few DB servers.

What is even worse, is that as you stated, you need to design your database so much around YB/CRDB. that you can use the above mentioned pq solutions to get way more gains.

I hope this responds is of use.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: