The important difference is that we used a more realistic temperature profile, which as you say does affect compression for that column. Schema design (including sort order, compression, and codecs) for the remaining columns is just good ClickHouse practice. Much of the storage and I/O savings is in the date, time, and sensor_id and columns.
It's also useful to note that the materialized view results would be essentially the same no matter how you generate and store data because the materialized view down-samples temperature max/min to daily aggregates. The data are vastly smaller no matter how you generate them.
The article illustrates that if you really had such an IoT app and designed it properly you could run analytics with surprisingly few resources. I think that's a significant point.
That's what you wanted to show, but what you ended up showing is that if you have different data, then the query performance can be quite good.
I get the desire to critique the temperature profile, but completely changing it makes the comparison worthless. From a data perspective it's like saying "if all the sensors just report 1 for temperature every reading, computing the min, max, and average is super fast". No shit, that wasn't the task though.
But they didn't set the temperature reading to anything that would advantage their tests. Without access to the original data they simply generated a dataset as close to the original dataset and volume as possible. The fact they took a few sentences talking about the temperature doesn't equate to invalidating the test.
Looking at this your way - Scylla used an INT, Altinity used a Decimal type with specialized compression (T64). I can tell you that this would have hampered ClickHouse and advantaged Scylla. It's the opposite of what you're saying. They actually performed this benchmark with one arm tied behind their back.
It's a funny benchmark anyway because the two systems have very different use cases but it doesn't invalidate the result.
Then you should provide results for both test datasets to make the point of using a more realistic approach. Materialized views are not news, nor is properly designed analytics applications. For me the importance is how click house is better and why.
A column-store will be magnitudes faster at analytical queries than any rowstore system. This is fundamental architecture and the data used makes little to no difference. You could use the exact ScyllaDB dataset duplicated to trillions of rows and still arrive at the same relative performance figures.
It's also useful to note that the materialized view results would be essentially the same no matter how you generate and store data because the materialized view down-samples temperature max/min to daily aggregates. The data are vastly smaller no matter how you generate them.
The article illustrates that if you really had such an IoT app and designed it properly you could run analytics with surprisingly few resources. I think that's a significant point.