It wasn't clear to me how it was related to Postgres so I spun up a cluster and alloydb is Postgres with a different storage engine (similar to the long awaited zedstore project). That means all Postgres features work.
Yes it seems to me they have taken the same approach as AWS Aurora. Use Postgres frontend and query parsing and replace the storage layer and query execution with their own.
This is a good HTAP system for PostgreSQL ecosystem. Emm, what about a MySQL compatible HTAP system? An open source project (https://www.vldb.org/pvldb/vol13/p3072-huang.pdf) seems a good fit in this area that process both transactional and analytical workload at real-time.
Thanks for mentioning TiDB here. It is mysql compatible HTAP database. But different with AlloyDB. We do not reuse the code of MySQL/PostgreSQL. So there is no legacy code and more suitable for the HTAP scenario. It is easy to scale out and available on any cloud. More information here: https://docs.pingcap.com/tidb/dev/quick-start-with-htap
The mixed column and row store HTAP approach is very similar to how single store works (https://www.singlestore.com/) which offers a mysql compatible interface with a few small differences.
MariaDB Columnstore! And now Xpand has columnar indexes.
Columnstore was previously known as InfiniDB and Xpand, Clustrix.
Both have been in production for over 10 years and I have been a very happy user of them.
>Recently, at Google I/O, we announced AlloyDB for PostgreSQL, a fully-managed, PostgreSQL-compatible database for demanding, enterprise-grade transactional and analytical workloads.
This is exactly what a HTAP(Hybrid Transactional and Analytical Processing) system does.
Great product positioning. I hope their follow-up is better than typical Google. The ML features are a good start, but I hope they get some BigQuery ML style model-definition-in-SQL features.
The list of supported extensions [1] is interesting and covers the biggies. Still, it always kills me that these hosted "Postgres+" solutions end up meaning no third party extensions. ZomboDB (easy ElasticSearch integration) is a PG extension I've always wanted to try, but haven't been able to.
At least in my experience, most of the time the analytical queries to generate some kind of business report or display statistical graph UI on admin dashboard or whatever, need not be run few times a second. If you run ad-hoc analytical queries all the time, no matter how advanced is the database, you will bring it to its knees in some way.
You can use old age tricks to enhance both OLTP and OLAP performance, master-associate (3NF or maybe 4NF) table to distribute columns across various tables, individual table partitioning, advance indexing feature of Postgres and periodic refresh of Materialised View to get decent performance.
Interesting they touched on a couple of pain points I’ve had with spark execution, not quite the same thing but I’ve wondered why some of the way it execute isn’t smarter. Bloom filters for cheap way to filter big joins, using single scans to do multiple jobs in the plan.. I’ll have to have a look.
It is hard to overstate how poorly optimized the Postgres architecture is for modern analytical workloads on modern hardware. It is an OLTP engine with 1990s design assumptions. You can modify it heavily to make it adequate for analytics but it won't be anything close to optimal due to architectural limitations. And I say that as a big fan of Postgres.
There is a longstanding heuristic that, outside of the workload for which Postgres was designed, a state-of-the-art database implementation will be 100x faster on the same hardware. That heuristic is well-grounded in reality in my experience; I often measure new database kernel implementations relative to Postgres. Postgres is still great for classic OLTP workloads and you will find that the gap is smaller there. For analytical processing, 100x gain is pretty straightforward.
No one should read this as reflecting poorly on Postgres. Database architecture and the hardware it runs on has evolved enormously in the decades since Postgres was designed. Most people have no sense of just how fast and scalable a modern database kernel is compared to the designs from 20+ years ago.
I did some utterly trivial benchmarks of aggregating over a (large) single table, and found PG and MSSQL to be about the same (both using row storage).
Where does your 100X slowdown figure come from, and what 'architectural limitations' are you referring to? Genuine question as DBs architectures are interesting to me. TIA
Postgres and Sql Server are both similar generations of database architecture, so I would expect them to perform similarly in some aggregate sense. For Postgres, I would argue that the major architectural limitations that impact relative performance are in three areas:
- The storage engine architecture is obsolete and highly suboptimal. It has no way to accommodate modern high-density or high-bandwidth storage, both of which are the norm now, nor is it possible to implement any of the (very successful) I/O schedule optimization concepts that have emerged in recent years. The net effect is a gross underutilization of the capabilities of modern storage hardware.
- The internal page layout and table architecture in Postgres is a textbook classic but not appropriate for most workloads on modern hardware. Modern page engines need great performance across diverse categories of workloads and data models -- we use databases for a lot more than accounting systems these days -- while also being amenable to aggressive use of hardware optimization like SIMD. Thinking in terms of "row-oriented" and "column-oriented" is a false dichotomy; optimized page engines commonly have many elements of both but are not identifiable as pure expressions of either nor even a simple hybrid (e.g. PAX layouts). This is one of the easiest changes to backport into old database architectures.
- Indexing in some modern systems have been completely reimagined to great effect. Tables are effectively index-organized with no secondary indexing structures but without loss of high query selectivity across diverse columns; there is a separation of the concerns regarding fast search and key constraint enforcement; major increases in index write throughput concurrent with major decreases in storage and memory footprint (no B-tree bloat). For small tables, this optimization has no effect and may even be a mild pessimization depending on the workload. As tables become larger the performance starts to diverge greatly, becoming multiple orders of magnitude faster. The Postgres architecture can't be modified to support this.
If you built a database engine that was approximately state-of-the-art in these three areas, I would expect a 100x difference in performance relative to Postgres for many large data model workloads using server hardware like an AWS i3en.24xlarge. Obviously building such a database would be a hell of a lot of work, it isn't something you can reasonably do for fun on nights and weekends.
Edit: this comes across rude (I suppose it is), and maybe I'm wrong to doubt you but I'm no DB n00b and pretty well everything about your post just doesn't make sense, given my knowledge.
What? [deleted] A quick browse then...
> The storage engine architecture is obsolete and highly suboptimal
cite?
> It has no way to accommodate modern high-density or high-bandwidth storage
> nor is it possible to implement any of the (very successful) I/O schedule optimization concepts that have emerged in recent years. The net effect is a gross underutilization of the capabilities of modern storage hardware.
No evidence given.
> Modern page engines need great performance across diverse categories of workloads and data models
It's chunks in memory that's all, how is it failing then?
> Indexing in some modern systems have been completely reimagined to great effect. Tables are effectively index-organized with no secondary indexing structures
just what?
> but without loss of high query selectivity across diverse columns
selectivity is a feature of data, not how it is organised, at all
> there is a separation of the concerns regarding fast search and key constraint enforcement
what? what? what? what? what? what? what? what? Search is a read issue, key constraints are a write issue...?
> no B-tree bloat
seriously WTF?
> As tables become larger the performance starts to diverge greatly, becoming multiple orders of magnitude faster
"multiple orders of magnitude" - Wuuuut? cite please
I'm on a long road trip, so can't spend a lot of time responding right now (gotta hit the road). But some quick points:
The page layout is important because many workloads are memory-bandwidth bound. Effective selectivity mechanisms outside the page reduce the need for intra-page selectivity optimization in terms of delivering performance but you still need to preserve memory bandwidth. This biases designs with good page-external selectivity mechanisms to optimize for widening the set of workloads they perform well on instead of squeezing out slightly more selectivity for narrow workloads.
Some of these assertions are self-evident and non-controversial, such as the poor storage performance of Postgres and the issue of B-tree bloat generally. Using mmap() for storage is the low-performance option, articles regarding which are regularly posted on HN, so the fact that it "has mmap" is a good example of why it is expected to perform poorly (though it isn't the only reason in the case of Postgres). I believe there are plans in the works to potentially redesign the Postgres storage layer in future versions, so this may improve at some point.
But more broadly, you seem to be a bit confused about the theory of database kernel design and the tradeoffs that can be made there? You are questioning things about how actual, real, databases, including most open source ones, are designed.
> Effective selectivity mechanisms outside the page reduce the need for intra-page selectivity optimization in terms of delivering performance but you still need to preserve memory bandwidth
"mechanisms/optimization/delivering performance" - these are just aspirational words - exactly what 'selectivity mechanisms'? I can barely make sense of this. In fact, I can't. At all.
> Some of these assertions are self-evident and non-controversial
They aren't self-evident because they aren't evident to me, and as for non-controversial - this is just ducking the question. You're still not justifying anything.
> and the issue of B-tree bloat generally
A btree page will be between 1/2 and completely full. Random insertions should make them ~75% full. In MSSQL I just created a table of ints (clustered PK) and inserted about a million random ints and had a look at fullness (page = 8060 bytes):
- Avg. Bytes Free per Page.....................: 2046.7
- Avg. Page Density (full).....................: 74.71%
yep, so an average of 3/4 full. Bit of expected free space. Where's the bloat you talk of, and what's the alternative structure if this is unacceptable.
Edit: this is just leaf nodes. You might claim it's ignoring non-leaf nodes. Check: 4MB data gets you 56K non-leaf. Massive 2,000 factor fanout (8K page / 4 byte ints) is why. So what bloat?
> Using mmap() for storage is the low-performance option, articles regarding which are regularly posted on HN,
for small files, so I understand. For large AFAIK they're good, which is why they're used. So what's the alternative? Point to where HN says mmap is slow for large files.
> that it "has mmap" is a good example of why it is expected to perform poorly
justify this please.
> But more broadly, you seem to be a bit confused about the theory of database kernel design and the tradeoffs that can be made there?
You may be right but you've just handwaved my doubts away without any evidence.
> You are questioning things about how actual, real, databases, including most open source ones, are designed.
I'm questioning your claims in the hope I can understand better. I'm not a total noob. I am wondering why I can't make sense of you posts.
I need some ELI5 explaining to: colstores allow you to pick out just the cols of data you need rather than get the full row and discard what you don't need.
If that alone explained the 100X speedup that would mean the row is storing 99% of data that's not of interest to a query. That would a) be unlike pretty most table structures + queries I've ever seen, but in those cases cases such as when a fat blob is stored in the row, you hive off blobs to a separate table and only join to that when blob's wanted.
You might be interested in [0], which is one of the course readings for [1] which has video available ("Storage Models, Data Layout, & System Catalogs"). Specifically, the paper asks if you can turn a row-store into a column-store by just vertically partitioning the schema or building more indexes etc; the answer is no, and they go into various reasons why (late materialization 3x, compression 2x to 10x depending on whether query is accessing sorted data, etc).
Might be missing something, but using an interpreted language that splashes its objects around in memory with pointers everywhere, using json, and gets a 50% speedup, depending, doesn't look like a convincing test of anything.
Do it in whatever language you want to prove the concepts! I'm not trying to convince of the facts. I'm mentioning an approach to experimentally learning the facts yourself. :)
I doubt it's that alone (and a 10% factor is much more reasonable than a 1% factor). But I think another factor is that column stores usually compress the data in each column. Which can be particularly effective for columns with a lot of repeated values.
The sweet spot for columnar storage is calculating aggregations across a subset of columns. It can be really fast at that. But if you what you need is select * it might actually be slower in that case.
Doing OLTP writes into a columnar storage is pretty complex. Also columnar storage needs to be in memory or on SSD at least, not HDD where read performance would be abysmal without serious gymnastics (thus in particular, given the RAM and SSD prices back then, you just couldn't really have a columnar db widespread adoption 15+ years ago).
Not really speculation, its mentioned in the article.
Theres not really any free lunch here, vanilla postgres is optimized to perform well within the constraints of a single scale-up host, minimizing storage usage as much as possible.
These Aurora style systems instead take the assumption that storage is relatively cheap in a cloud environment, and likewise with any compute task that can be scaled out instead of up. So move as much as possible out of the scale-up instance to scale-out storage and compute.
Additionally Google is claiming their system is better than Aurora because google has a distributed filesystem, where AWS only has block storage (tied to specific compute) and object storage (much worse performance).
> and likewise with any compute task that can be scaled out instead of up. So move as much as possible out of the scale-up instance to scale-out storage and compute.
This was very much the main lesson I got reading up on DataDog's third-gen event storage system Husky[1]. Great read.
How did the timeline work with Vertica and Paraccel? I know Paraccel was around the same time, but I'm not sure who was really first, or if it's even an important distinction.
This article discusses how TimescaleDB (packaged as an extension to PostgreSQL) approaches performance improvements, though it's a bit of an old piece and things have moved on with TimescaleDB too. It gives some good insights though. https://www.timescale.com/blog/timescaledb-vs-6a696248104e/ There are some more recent articles on the blog about performance and benchmarks if you're tantalized by that one.
what is the target audience. knowing google already has Spanner. to me it sounds like it is targeting only PostegreSQL on-prem users that want to migrate to the cloud? Are they many fortune 5000 companies running PGSQL at scale in a complex setup where they will have value moving to a managedd solution?
In general I think Spanner is more focused on availability than performance (especially across different workloads e.g. analytics/transactions). Also, the Postgres shim for Spanner is not 100% compatible and less mature than Alloy. Last, Spanner is expensive!
Imho, Spanner used to be on the expensive side. But with the introduction of "compute units" and now 4TB storage per node, the price for Spanner looks pretty great.
Aurora innovated by replacing the storage layer with a network store. Replicas don't need their own copy of the data (they point at the same network store) making replication lag almost non-existent. It means new replicas can be brought up in minutes. It allows really cool features like fast cloning and point in time recovery.
AlloyDB takes the same approach and adds some additional optimizations in query execution. It introduces a new caching layer and claims to use ML rather than a simple LRU cache. It also introduces cache-only representations of data in a columnar format, which allows it to greatly speed up analytical queries.