I am curious about the motivation you choose Clickhouse over Apache Pinot, and A...

vadman97 · on April 21, 2023

For us, a significant reason was the ClickHouse cloud-hosted offering, rather than having to manage a cluster ourselves. Their use of S3 as the backing storage medium means that large-scale data retention is quite affordable.

A good comparison we've referenced: https://leventov.medium.com/comparison-of-the-open-source-ol...

grumblestumble · on April 21, 2023

For reference, Apache Druid has an equivalent in Imply Polaris, and Apache Pinot has an equivalent in Startree. I can't speak for Startree, but Polaris similarly uses S3 for backing.

dimitrios1 · on April 21, 2023

When I was highly engaged with Imply (Druid) a few years ago, S3 was also used as a backing storage. Is this not the case anymore?

metahunter · on April 21, 2023

I think both Pinot and Druid nowadays offer cloud-hosted solutions. Maybe you started early that only ClickHouse had that offering. Is cloud hosting the only reason you guys choose Clickhouse? I am also wondering is it possible to let users choose the data source?

tnolet · on April 21, 2023

Slight hijack. I / we went through a very similar tech selection process for timeseries metrics (not logging) ~1.5 years ago. We looked at Druid, ElasticSearch, TimeScale and a bunch of others.

Main takeaways were: the SQL flavor and its aggregations in CH are amazing. Running on a single node for dev laptops is trivial. It’s crazy fast with almost zero tuning.

It does not surprise me at all the CH is powering new products and startups.

Note: hosted CH did not exist yet. We are using Altinity to run our cluster.

podoman · on April 21, 2023

> Note: hosted CH did not exist yet. We are using Altinity to run our cluster.

It exists now actually. We (highlight) are on hosted clickhouse, which went in GA a few months ago. https://clickhouse.com/cloud

hodgesrm · on April 22, 2023

Thanks for the shout out! "Altinity" in this case means Altinity.Cloud, which is a high-performance cloud ClickHouse. It's been around for over 2.5 years.

Disclaimer: I work at Altinity.

preseinger · on April 21, 2023

if you can afford SQL then you're not really doing timeseries in any meaningful sense

berkle4455 · on April 21, 2023

Clickhouse is fast and doesn’t have absurd architectural complexity.

podoman · on April 21, 2023

frankjr · on April 21, 2023

Not having to deal with a JVM is a major plus tbh.

douglasisshiny · on April 21, 2023

I've seen so many variations of this comment on HN and I'm still not sure why not having to deal with the JVM is a major plus.

wpietri · on April 21, 2023

I'm perfectly fine with JVMs, but at a guess, some of it is the usual snobbery for anything strange. But some of it is due to associating JVMs with enterprise nightmares. And some is that JVM tuning is a bit of a dark art. I've made some very good money going in and turning JVM knobs that others were afraid to touch. (The secret, by the way, is to hack together some decent load simulation and then measure not just median numbers but things like 99th percentile latency.)

pstoll · on April 22, 2023

Have you ever operated a fleet of critical JVM instances and needed more memory? Don’t go over 32GB ram in an instance or the operating characteristics of you entire app change. Compressed memory pointers - oops. They are a blast to debug / operate!

https://stackoverflow.com/questions/25120546/trick-behind-jv...

vetrom · on April 21, 2023

JVM runtimes have a relatively high startup cost, are not often good 'citizens' in an instance running multiple types of software, and the build processes for a lot of JVM deliverables is an ungodly mess.

Many of those bells and whistles are near-necessary in the enterprise world, but you have the accumulated mass of 'red zones' and developmental landmines in that ecosystem that can quickly turn you off it as a whole if you want to understand the whole system.

douglasisshiny · on April 21, 2023

I still don't understand some of this -- I developed in Java for 5+ years.

>JVM runtimes have a relatively high startup cost I think many people are okay with that when developing server software that's going to run weeks at a time. It can get a bit annoying with trying to rapidly iterate. And I think things are changing pretty quickly with AOT builds and general improvements.

>and the build processes for a lot of JVM deliverables is an ungodly mess.

I recall using "mvn package." That's it. This was on two different systems that served a good bit of traffic and weren't simple trivial projects.

jeroenhd · on April 21, 2023

I don't know if it's a standard Java thing or just an IntelliJ thing, but there's a setting that will hot-patch a running JVM when you change code. Things can get messy if you (or your dependencies) make assumptions about the ClassLoader being used, but other than that it works great.

Still not as good as C#'s debugger in Visual Studio (hit a breakpoint, edit the code, drag the execution back before the problem, resume and run the patched version) but nothing I've seen really is.

Setting up Gradle projects is a bit more involved depending on your setup, but in the end it's still a single command to build an executable JAR.

douglasisshiny · on April 21, 2023

Yeah, it's been a second since I've used IntelliJ/Spring, but I recall that being the case as well.

Gradle is something I've never messed with, but that makes sense.

drowsspa · on April 21, 2023

I take it you haven't experienced the hell that is to deal with Hadoop JARs. It's absolutely ridiculous.

drowsspa · on April 21, 2023

Having to worry about GC in a database is a pretty bad experience. It also tends to require way more resources than necessary, and just a pretty complex configuration

preseinger · on April 21, 2023

gc isn't the issue, the jvm is the issue

preseinger · on April 21, 2023

basically, the jvm is technically sophisticated but operationally complicated

it sucks to use

many people believe otherwise, but those people have rich jvm experience, which is not easy to get

drowsspa · on April 21, 2023

Druid has like 9 different node types and inherits the whole Hadoop configuration mess and complexity

grumblestumble · on April 21, 2023

3, and there's absolutely no need for hadoop, particularly with MSQ

tnolet · on April 21, 2023

Anecdata: tried out Druid and Clickhouse for my SaaS. Couldn’t get Druid working. CH ran in 2 minutes.

metahunter · on April 21, 2023

Interesting, just found another post from yesterday about the comparison: https://news.ycombinator.com/item?id=35642522, though the comparison is coming from Pinot team.