Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The implication is that clickhouse can't easily support transactional queries. That's why it's an OLAP not OLTP database. (On-Line Analytics Processing vs On-Line Transaction Processing).


This is not the implication at all.

Clickhouse can easily add fsync, they just choose not to do it.

Mongodb also did not use fsync and was ridiculed for it, yet no one mentions this about clickhouse.


> Mongodb also did not use fsync and was ridiculed for it, yet no one mentions this about clickhouse.

MongoDB claimed to be a replacement for RDBMS-es (which includes OLTP). ClickHouse is explicit about being OLAP-only. MongoDB also hid the fact that they weren't doing fsync, especially when showing off "benchmarks" against OLTP RDBMS-es, while ClickHouse has not tried to show themselves as a replacement for OLTP RDBMS-es.

> Clickhouse can easily add fsync, they just choose not to do it.

For good reason. It's not a simple matter of choosing one of two options. The choice has consequences: performance.


I can't find any evidence showing that OLAP means it is okay to lose data from unexpected shutdowns. How can you have correct analytics without a complete set of data?

> For good reason. It's not a simple matter of choosing one of two options. The choice has consequences: performance.

It is a simple matter though. They can choose to sacrifice performance for data durability which I suspect would not be impacted very much since clickhouse acts like an append log. It just seems that Yandex doesn't care much for durability since they are just using the database to store people's web traffic. They wouldn't care if some of that data is lost so they don't use fsync.


> I can't find any evidence showing that OLAP means it is okay to ...

OLAP also doesn't mean "be the source of truth of the data". You can have a separate source of truth of the "complete set of data" outside of your OLAP engine and load (and reload) data into your OLAP engine any time you're not sure if you have the "complete set of data" in it.

The important difference lies in how often one finds themselves in that situation. In OLAP, the sheer majority of the time is spent querying (i.e., reading) data than loading (i.e., writing) data and waiting for it to be durably saved (i.e., fsync-ed). Because of this imbalance, it makes sense to prioritise for one scenario and handle the other sub-optimally.

> They wouldn't care if some of that data is lost so they don't use fsync.

Or, they can still care about data correctness and simply re-load data they suspect is/may not consistent in the rare case of an improper shutdown. It's not like they use ClickHouse as their primary data store.


To add to pritambaral comments.

The top commercial high performance timeseries databases, which ClickHouse can usually best, used by banks to make decisions on your money also don't use fsync. You can literally quit the software and watch your transaction data be written out 5 seconds later.

Edit: a word


Oh that’s not too bad, they’re very explicit about not having transaction support, thanks for explaining.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: