I spent a bunch of time reading the website and I can't tell what this product does: Is it a database engine or an abstraction on top of other products?
If it's a database engine, then here are my thoughts: How is this database engine built to replace SQL and No-SQL? If it doesn't support JOINS why would I replace my SQL with it? Are transactions ACID? Why would anyone use this if there are no built-in user / group security mechanisms to protect data?
Crate is a database engine that uses Lucene for storage and leverages portions of Elastic for cluster management. It is a NoSQL storage engine that gives you an SQL API (via REST). It does not support JOINs yet but this work is well underway and we expect a release with JOINs soon. We have several customers migrating from MySQL without JOINs (either handling joins in the application layer, using arrays in columns, or denormalizing data). Transactions are not ACID, we fall into the eventually consistent realm. And finally, not all use cases require user/group security, in fact much of the current SQL usage falls into the single user category.
I also agree that our site is not as clear as it needs to be and we're working on it already.
That's part of it, but not the whole picture. For example, we mostly bypass the ES query engine and go directly to Lucene. Queries are not simply translated to ES query syntax. Also, we've done a lot more work than simply pasting an SQL layer over the top. We've built streaming BLOB support, a distributed SQL layer with real-time MapReduce, and a distributed aggregation engine that gives accurate results for aggregations rather than HLL estimates.
If you'd like we're happy to answer any questions in IRC or our Google Group:
https://groups.google.com/forum/#!forum/crateio
IRC Freenode #crate: irc://irc.freenode.net/crate @mention anyone with Voice
do you have a field type that indexes in real time? or are you bound to the (default 1s) index delay from es?
this is one thing that bothers me with elasticsearch, that I can not define eg "type": "cart","index":"realtime", "not-analyzed" so if an item gets added to a cart, the subsequent count would directly return the correct number of items in the cart.
not yet. but we have some "tweaks" for exactly your use-case on our backlog. using the client libraries should make it mucn easier (e.g. https://crate.io/docs/projects/crate-python/stable/sqlalchem...). so right now you would need to do a refresh.
on a side note: it's not an index delay. it's the readers that "sit" on the lucene index. they are being repurposed for performance reasons (and meanwhile other writes are appending). like the client libraries you can force reopening them (https://crate.io/docs/en/0.47.8/sql/reference/refresh.html) - of course at the cost of performance.
We run aggregations fully distributed and when iterating over the values we heavily rely on the field-caches. They hold the values of the latest used fields in memory and therefor allow in-memory performance on them. for example they don't grow linearly with the amount of rows stored, but depend on the cardinality of the fields. Running aggregations over non-indexed data is not supported.
If it's a database engine, then here are my thoughts: How is this database engine built to replace SQL and No-SQL? If it doesn't support JOINS why would I replace my SQL with it? Are transactions ACID? Why would anyone use this if there are no built-in user / group security mechanisms to protect data?