Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AlloyDB for PostgreSQL under the hood: Columnar engine (cloud.google.com)
108 points by eatonphil on May 26, 2022 | hide | past | favorite | 65 comments


It wasn't clear to me how it was related to Postgres so I spun up a cluster and alloydb is Postgres with a different storage engine (similar to the long awaited zedstore project). That means all Postgres features work.


Yes it seems to me they have taken the same approach as AWS Aurora. Use Postgres frontend and query parsing and replace the storage layer and query execution with their own.


This is a good HTAP system for PostgreSQL ecosystem. Emm, what about a MySQL compatible HTAP system? An open source project (https://www.vldb.org/pvldb/vol13/p3072-huang.pdf) seems a good fit in this area that process both transactional and analytical workload at real-time.


Thanks for mentioning TiDB here. It is mysql compatible HTAP database. But different with AlloyDB. We do not reuse the code of MySQL/PostgreSQL. So there is no legacy code and more suitable for the HTAP scenario. It is easy to scale out and available on any cloud. More information here: https://docs.pingcap.com/tidb/dev/quick-start-with-htap


The mixed column and row store HTAP approach is very similar to how single store works (https://www.singlestore.com/) which offers a mysql compatible interface with a few small differences.


MariaDB Columnstore! And now Xpand has columnar indexes.

Columnstore was previously known as InfiniDB and Xpand, Clustrix. Both have been in production for over 10 years and I have been a very happy user of them.


Even the HTAP cloud service is available to try: https://en.pingcap.com/tidb-cloud/


HTAP?


>Recently, at Google I/O, we announced AlloyDB for PostgreSQL, a fully-managed, PostgreSQL-compatible database for demanding, enterprise-grade transactional and analytical workloads.

This is exactly what a HTAP(Hybrid Transactional and Analytical Processing) system does.


Great product positioning. I hope their follow-up is better than typical Google. The ML features are a good start, but I hope they get some BigQuery ML style model-definition-in-SQL features.

The list of supported extensions [1] is interesting and covers the biggies. Still, it always kills me that these hosted "Postgres+" solutions end up meaning no third party extensions. ZomboDB (easy ElasticSearch integration) is a PG extension I've always wanted to try, but haven't been able to.

[1] https://cloud.google.com/alloydb/docs/reference/extensions


>Great product positioning. I hope their follow-up is better than typical Google.

Looks like AlloyDB is a result of the influx of Oracle and SAP database people into GCP.


Or theft. Gitlab private projects don't offer confidentiality from the service provider afterall...


It's very frustrating.

We use ZomboDB heavily, and the lack of extension support in AWS/Google cloud offerings prevents us from switching.


I wonder if they ever open source at least a component of AlloyDB, which could be implemented as a Postgres extension, like Citus does.


I don't think s because it looks like they re-use some of the vectorized columnar engine that's being used for BigQuery


Taking a look and it looks like current minimum cost (us-west-3 / Salt Lake) is around $1,090 per month (4vCPU / 32 Gig * 2 instances).


100x perf sounds too good to be true in practise.

At least in my experience, most of the time the analytical queries to generate some kind of business report or display statistical graph UI on admin dashboard or whatever, need not be run few times a second. If you run ad-hoc analytical queries all the time, no matter how advanced is the database, you will bring it to its knees in some way.

You can use old age tricks to enhance both OLTP and OLAP performance, master-associate (3NF or maybe 4NF) table to distribute columns across various tables, individual table partitioning, advance indexing feature of Postgres and periodic refresh of Materialised View to get decent performance.


Interesting they touched on a couple of pain points I’ve had with spark execution, not quite the same thing but I’ve wondered why some of the way it execute isn’t smarter. Bloom filters for cheap way to filter big joins, using single scans to do multiple jobs in the plan.. I’ll have to have a look.


How'd they make it 50-100x faster than postgres? That's some serious innovation.


It is hard to overstate how poorly optimized the Postgres architecture is for modern analytical workloads on modern hardware. It is an OLTP engine with 1990s design assumptions. You can modify it heavily to make it adequate for analytics but it won't be anything close to optimal due to architectural limitations. And I say that as a big fan of Postgres.

There is a longstanding heuristic that, outside of the workload for which Postgres was designed, a state-of-the-art database implementation will be 100x faster on the same hardware. That heuristic is well-grounded in reality in my experience; I often measure new database kernel implementations relative to Postgres. Postgres is still great for classic OLTP workloads and you will find that the gap is smaller there. For analytical processing, 100x gain is pretty straightforward.

No one should read this as reflecting poorly on Postgres. Database architecture and the hardware it runs on has evolved enormously in the decades since Postgres was designed. Most people have no sense of just how fast and scalable a modern database kernel is compared to the designs from 20+ years ago.


I did some utterly trivial benchmarks of aggregating over a (large) single table, and found PG and MSSQL to be about the same (both using row storage).

Where does your 100X slowdown figure come from, and what 'architectural limitations' are you referring to? Genuine question as DBs architectures are interesting to me. TIA


Postgres and Sql Server are both similar generations of database architecture, so I would expect them to perform similarly in some aggregate sense. For Postgres, I would argue that the major architectural limitations that impact relative performance are in three areas:

- The storage engine architecture is obsolete and highly suboptimal. It has no way to accommodate modern high-density or high-bandwidth storage, both of which are the norm now, nor is it possible to implement any of the (very successful) I/O schedule optimization concepts that have emerged in recent years. The net effect is a gross underutilization of the capabilities of modern storage hardware.

- The internal page layout and table architecture in Postgres is a textbook classic but not appropriate for most workloads on modern hardware. Modern page engines need great performance across diverse categories of workloads and data models -- we use databases for a lot more than accounting systems these days -- while also being amenable to aggressive use of hardware optimization like SIMD. Thinking in terms of "row-oriented" and "column-oriented" is a false dichotomy; optimized page engines commonly have many elements of both but are not identifiable as pure expressions of either nor even a simple hybrid (e.g. PAX layouts). This is one of the easiest changes to backport into old database architectures.

- Indexing in some modern systems have been completely reimagined to great effect. Tables are effectively index-organized with no secondary indexing structures but without loss of high query selectivity across diverse columns; there is a separation of the concerns regarding fast search and key constraint enforcement; major increases in index write throughput concurrent with major decreases in storage and memory footprint (no B-tree bloat). For small tables, this optimization has no effect and may even be a mild pessimization depending on the workload. As tables become larger the performance starts to diverge greatly, becoming multiple orders of magnitude faster. The Postgres architecture can't be modified to support this.

If you built a database engine that was approximately state-of-the-art in these three areas, I would expect a 100x difference in performance relative to Postgres for many large data model workloads using server hardware like an AWS i3en.24xlarge. Obviously building such a database would be a hell of a lot of work, it isn't something you can reasonably do for fun on nights and weekends.


Edit: this comes across rude (I suppose it is), and maybe I'm wrong to doubt you but I'm no DB n00b and pretty well everything about your post just doesn't make sense, given my knowledge.

What? [deleted] A quick browse then...

> The storage engine architecture is obsolete and highly suboptimal

cite?

> It has no way to accommodate modern high-density or high-bandwidth storage

WTF does this even mean? Storage is a hardware issue, nothing to do with the DB engine. And it has mmap and huge pages <https://www.postgresql.org/docs/current/runtime-config-resou...> so what's it missing?

> nor is it possible to implement any of the (very successful) I/O schedule optimization concepts that have emerged in recent years. The net effect is a gross underutilization of the capabilities of modern storage hardware.

No evidence given.

> Modern page engines need great performance across diverse categories of workloads and data models

It's chunks in memory that's all, how is it failing then?

> Indexing in some modern systems have been completely reimagined to great effect. Tables are effectively index-organized with no secondary indexing structures

just what?

> but without loss of high query selectivity across diverse columns

selectivity is a feature of data, not how it is organised, at all

> there is a separation of the concerns regarding fast search and key constraint enforcement

what? what? what? what? what? what? what? what? Search is a read issue, key constraints are a write issue...?

> no B-tree bloat

seriously WTF?

> As tables become larger the performance starts to diverge greatly, becoming multiple orders of magnitude faster

"multiple orders of magnitude" - Wuuuut? cite please


I'm on a long road trip, so can't spend a lot of time responding right now (gotta hit the road). But some quick points:

The page layout is important because many workloads are memory-bandwidth bound. Effective selectivity mechanisms outside the page reduce the need for intra-page selectivity optimization in terms of delivering performance but you still need to preserve memory bandwidth. This biases designs with good page-external selectivity mechanisms to optimize for widening the set of workloads they perform well on instead of squeezing out slightly more selectivity for narrow workloads.

Some of these assertions are self-evident and non-controversial, such as the poor storage performance of Postgres and the issue of B-tree bloat generally. Using mmap() for storage is the low-performance option, articles regarding which are regularly posted on HN, so the fact that it "has mmap" is a good example of why it is expected to perform poorly (though it isn't the only reason in the case of Postgres). I believe there are plans in the works to potentially redesign the Postgres storage layer in future versions, so this may improve at some point.

But more broadly, you seem to be a bit confused about the theory of database kernel design and the tradeoffs that can be made there? You are questioning things about how actual, real, databases, including most open source ones, are designed.


ok

> Effective selectivity mechanisms outside the page reduce the need for intra-page selectivity optimization in terms of delivering performance but you still need to preserve memory bandwidth

"mechanisms/optimization/delivering performance" - these are just aspirational words - exactly what 'selectivity mechanisms'? I can barely make sense of this. In fact, I can't. At all.

> Some of these assertions are self-evident and non-controversial

They aren't self-evident because they aren't evident to me, and as for non-controversial - this is just ducking the question. You're still not justifying anything.

> and the issue of B-tree bloat generally

A btree page will be between 1/2 and completely full. Random insertions should make them ~75% full. In MSSQL I just created a table of ints (clustered PK) and inserted about a million random ints and had a look at fullness (page = 8060 bytes):

    - Avg. Bytes Free per Page.....................: 2046.7
    - Avg. Page Density (full).....................: 74.71%
yep, so an average of 3/4 full. Bit of expected free space. Where's the bloat you talk of, and what's the alternative structure if this is unacceptable.

Edit: this is just leaf nodes. You might claim it's ignoring non-leaf nodes. Check: 4MB data gets you 56K non-leaf. Massive 2,000 factor fanout (8K page / 4 byte ints) is why. So what bloat?

> Using mmap() for storage is the low-performance option, articles regarding which are regularly posted on HN,

for small files, so I understand. For large AFAIK they're good, which is why they're used. So what's the alternative? Point to where HN says mmap is slow for large files.

> that it "has mmap" is a good example of why it is expected to perform poorly

justify this please.

> But more broadly, you seem to be a bit confused about the theory of database kernel design and the tradeoffs that can be made there?

You may be right but you've just handwaved my doubts away without any evidence.

> You are questioning things about how actual, real, databases, including most open source ones, are designed.

I'm questioning your claims in the hope I can understand better. I'm not a total noob. I am wondering why I can't make sense of you posts.


It might be helpful to read something like

DB2 with BLU Acceleration: So Much More than Just a Column Store

https://db.cs.pitt.edu/courses/cs3551/16-1/handouts/db2BLU.p...

or

Real-Time Analytical Processing with SQL Server

https://www.vldb.org/pvldb/vol8/p1740-Larson.pdf

to get a sense of the gap between postgres/open source databases vs something designed (at least partially) to better support analytical workloads.


Do you have any good sources to seek out for learning about these improvements, such as optimized page engines and better indexing?


> both using row storage > Where does your 100X slowdown figure come from

Given they specify analytical workloads, I'm betting a lot of it comes from column storage.


I need some ELI5 explaining to: colstores allow you to pick out just the cols of data you need rather than get the full row and discard what you don't need.

If that alone explained the 100X speedup that would mean the row is storing 99% of data that's not of interest to a query. That would a) be unlike pretty most table structures + queries I've ever seen, but in those cases cases such as when a fat blob is stored in the row, you hive off blobs to a separate table and only join to that when blob's wanted.

I'm missing something big here.


You might be interested in [0], which is one of the course readings for [1] which has video available ("Storage Models, Data Layout, & System Catalogs"). Specifically, the paper asks if you can turn a row-store into a column-store by just vertically partitioning the schema or building more indexes etc; the answer is no, and they go into various reasons why (late materialization 3x, compression 2x to 10x depending on whether query is accessing sorted data, etc).

[0] https://15721.courses.cs.cmu.edu/spring2020/papers/08-storag...

[1] https://15721.courses.cs.cmu.edu/spring2020/schedule.html


Now that is a bloody great answer! (too little karma to upvote, so my thanks instead).


Try it out yourself experimentally. You'll see the difference quickly.

https://datastation.multiprocess.io/blog/2021-10-18-experime...


Might be missing something, but using an interpreted language that splashes its objects around in memory with pointers everywhere, using json, and gets a 50% speedup, depending, doesn't look like a convincing test of anything.


Do it in whatever language you want to prove the concepts! I'm not trying to convince of the facts. I'm mentioning an approach to experimentally learning the facts yourself. :)


I doubt it's that alone (and a 10% factor is much more reasonable than a 1% factor). But I think another factor is that column stores usually compress the data in each column. Which can be particularly effective for columns with a lot of repeated values.


Missed compression, thanks!


> and what 'architectural limitations' are you referring to?

Row-based vs columnar-based storage. Latter is more suitable (read much faster) for OLAP workloads.


Columnar storage, pre-aggregation, and advanced indexing techniques based on algorithms like bloom indexes, hyperloglog, etc.


hyperloglog is not an indexing method


What are some modern databases in this context?


*50-100x faster for analytical queries.

Columnar storage. Vertica was the real innovator. Redshift, BigQuery, Snowflake, Citus, Timescale, etc just took the idea and ran with it.


Ah, so it's actually more marketing, since it's comparing completely different approaches.

I wonder if there are performance tradeoffs sometimes leading to slower than vanilla postgres?


The sweet spot for columnar storage is calculating aggregations across a subset of columns. It can be really fast at that. But if you what you need is select * it might actually be slower in that case.


Doing OLTP writes into a columnar storage is pretty complex. Also columnar storage needs to be in memory or on SSD at least, not HDD where read performance would be abysmal without serious gymnastics (thus in particular, given the RAM and SSD prices back then, you just couldn't really have a columnar db widespread adoption 15+ years ago).


They claim 4x faster against vanilla postgres for "commercial workloads" so it sounds really impressive.

> I wonder if there are performance tradeoffs sometimes leading to slower than vanilla postgres?

Speculation: alloydb probably has both a row store and a column store and some fancy logic to decide which one to use. That's usually what HTAP means.

In very broad strokes, columnar data stores struggle if you select many columns.


Not really speculation, its mentioned in the article.

Theres not really any free lunch here, vanilla postgres is optimized to perform well within the constraints of a single scale-up host, minimizing storage usage as much as possible.

These Aurora style systems instead take the assumption that storage is relatively cheap in a cloud environment, and likewise with any compute task that can be scaled out instead of up. So move as much as possible out of the scale-up instance to scale-out storage and compute.

Additionally Google is claiming their system is better than Aurora because google has a distributed filesystem, where AWS only has block storage (tied to specific compute) and object storage (much worse performance).


> and likewise with any compute task that can be scaled out instead of up. So move as much as possible out of the scale-up instance to scale-out storage and compute.

This was very much the main lesson I got reading up on DataDog's third-gen event storage system Husky[1]. Great read.

[1] https://www.datadoghq.com/blog/engineering/introducing-husky... https://news.ycombinator.com/item?id=31416843 (185 points, 9d ago, 15 comments)


If you are reading all columns from same row in your queries that'd break columnar storage use case, and would be slower.


How did the timeline work with Vertica and Paraccel? I know Paraccel was around the same time, but I'm not sure who was really first, or if it's even an important distinction.


Vertica came out of the C-Store paper (https://web.archive.org/web/20100619191833/http://db.lcs.mit...). The C-Store paper pre-dates ParAccel being incorporated. Many of the C-Store authors were involved in Vertica's creation.

If you want to be technical, C-Store was the real innovator...and those people refined and commercialized it as Vertica.


If I remember correctly, IBM published the idea even before Vertica.


What were the innovations by Vertica? Is there any link you have


This is the c-store paper, which was evolved into Vertica: http://www.cs.umd.edu/~abadi/vldb.pdf . It's very readable, worth a look.

edit: column stores existed before c-store, but c-store did some very nifty stuff around integrating compression awareness into the query executor


Thank you, I will read the paper!


This article discusses how TimescaleDB (packaged as an extension to PostgreSQL) approaches performance improvements, though it's a bit of an old piece and things have moved on with TimescaleDB too. It gives some good insights though. https://www.timescale.com/blog/timescaledb-vs-6a696248104e/ There are some more recent articles on the blog about performance and benchmarks if you're tantalized by that one.

Transparency: I work for Timescale


Supposedly, it is significantly slower than SingleStore on various TPC-H SF-10 queries [0].

[0] https://twitter.com/DomenicRavita/status/1529963959465435154


what is the target audience. knowing google already has Spanner. to me it sounds like it is targeting only PostegreSQL on-prem users that want to migrate to the cloud? Are they many fortune 5000 companies running PGSQL at scale in a complex setup where they will have value moving to a managedd solution?


Pgsql as an interface does not have vendor lock in that Spanner has. You can move to the cloud and back much easier.


Google seems to be working on offering Postgres interface compatibility to Spanner as well: https://cloud.google.com/spanner/docs/postgresql-interface


In general I think Spanner is more focused on availability than performance (especially across different workloads e.g. analytics/transactions). Also, the Postgres shim for Spanner is not 100% compatible and less mature than Alloy. Last, Spanner is expensive!


Imho, Spanner used to be on the expensive side. But with the introduction of "compute units" and now 4TB storage per node, the price for Spanner looks pretty great.

Links: https://cloud.google.com/spanner/docs/compute-capacity https://cloud.google.com/blog/products/databases/spanner-has...


Spanner usually handles transactional workload, but AlloyDB as HTAP may process the analytical workload at the same time, I think.


Where is the pricing without logging in?


From my layman's perspective, this is an analogue to Amazon's "Aurora with Postgres compatibility" [0]. Can anyone provide a more accurate comparison?

[0] https://aws.amazon.com/rds/aurora


Aurora innovated by replacing the storage layer with a network store. Replicas don't need their own copy of the data (they point at the same network store) making replication lag almost non-existent. It means new replicas can be brought up in minutes. It allows really cool features like fast cloning and point in time recovery.

AlloyDB takes the same approach and adds some additional optimizations in query execution. It introduces a new caching layer and claims to use ML rather than a simple LRU cache. It also introduces cache-only representations of data in a columnar format, which allows it to greatly speed up analytical queries.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: