I spent a bunch of time reading the website and I can't tell what this product d...

spanktar · on April 6, 2015

Crate is a database engine that uses Lucene for storage and leverages portions of Elastic for cluster management. It is a NoSQL storage engine that gives you an SQL API (via REST). It does not support JOINs yet but this work is well underway and we expect a release with JOINs soon. We have several customers migrating from MySQL without JOINs (either handling joins in the application layer, using arrays in columns, or denormalizing data). Transactions are not ACID, we fall into the eventually consistent realm. And finally, not all use cases require user/group security, in fact much of the current SQL usage falls into the single user category.

I also agree that our site is not as clear as it needs to be and we're working on it already.

arthursilva · on April 6, 2015

Seems like it's an layer over ElasticSearch using Presto SQL parser. SQL queries are translated into ES queries.

spanktar · on April 6, 2015

That's part of it, but not the whole picture. For example, we mostly bypass the ES query engine and go directly to Lucene. Queries are not simply translated to ES query syntax. Also, we've done a lot more work than simply pasting an SQL layer over the top. We've built streaming BLOB support, a distributed SQL layer with real-time MapReduce, and a distributed aggregation engine that gives accurate results for aggregations rather than HLL estimates.

If you'd like we're happy to answer any questions in IRC or our Google Group: https://groups.google.com/forum/#!forum/crateio IRC Freenode #crate: irc://irc.freenode.net/crate @mention anyone with Voice

naiv · on April 6, 2015

do you have a field type that indexes in real time? or are you bound to the (default 1s) index delay from es?

this is one thing that bothers me with elasticsearch, that I can not define eg "type": "cart","index":"realtime", "not-analyzed" so if an item gets added to a cart, the subsequent count would directly return the correct number of items in the cart.

jodok · on April 6, 2015

not yet. but we have some "tweaks" for exactly your use-case on our backlog. using the client libraries should make it mucn easier (e.g. https://crate.io/docs/projects/crate-python/stable/sqlalchem...). so right now you would need to do a refresh. on a side note: it's not an index delay. it's the readers that "sit" on the lucene index. they are being repurposed for performance reasons (and meanwhile other writes are appending). like the client libraries you can force reopening them (https://crate.io/docs/en/0.47.8/sql/reference/refresh.html) - of course at the cost of performance.

ddorian43 · on April 6, 2015

Why can't you aggregate on non-indexed fields? I know lucene doesn't allow that, but why? It seems to work on normal-rdbms ?

jodok · on April 6, 2015

We run aggregations fully distributed and when iterating over the values we heavily rely on the field-caches. They hold the values of the latest used fields in memory and therefor allow in-memory performance on them. for example they don't grow linearly with the amount of rows stored, but depend on the cardinality of the fields. Running aggregations over non-indexed data is not supported.

jodok · on April 6, 2015

arthur, that's how we started two years ago. meanwhile the queries are being handled by our own engine. https://github.com/crate/crate