I poked around trying to find a high level understanding. Here's the best place ...

I poked around trying to find a high level understanding.

Here's the best place to start from what I could tell: https://docs.pinot.apache.org/basics/concepts

Based on that, it's a MPP columnar database focused on low-latency streaming-ingested/realtimeish use cases open sourced by LinkedIn's infra teams:

"Pinot is designed to deliver low latency queries on large datasets. To achieve this performance, Pinot stores data in a columnar format and adds additional indices to perform fast filtering, aggregation and group by.

Raw data is broken into small data shards. Each shard is converted into a unit called a segment. One or more segments together form a table, which is the logical container for querying Pinot using SQL/PQL.

... Logically, a cluster is simply a group of tenants. As with the classical definition of a cluster, it is also a grouping of a set of compute nodes. Typically, there is only one cluster per environment/data center. There is no needed to create multiple clusters since Pinot supports the concept of tenants. At LinkedIn, the largest Pinot cluster consists of 1000+ nodes distributed across a data center. The number of nodes in a cluster can be added in a way that will linearly increase performance and availability of queries."

Also per https://docs.pinot.apache.org/basics/getting-started/frequen...

Q: When are new events queryable when getting ingested into a real-time table?

A: Events are available to queries as soon as they are ingested. This is because events are instantly indexed in memory upon ingestion.

The ingestion of events into the real-time table is not transactional, so replicas of the open segment are not immediately consistent. Pinot trades consistency for availability upon network partitioning (CAP theorem) to provide ultra-low ingestion latencies at high throughput. However, when the open segment is closed and its in-memory indexes are flushed to persistent storage, all its replicas are guaranteed to be consistent, with the commit protocol.

... Q: Why are segments not strictly time-partitioned?

A: It might seem odd that segments are not strictly time-partitioned, unlike similar systems such as Apache Druid. This allows real-time ingestion to consume out-of-order events. Even though segments are not strictly time-partitioned, Pinot will still index, prune, and query segments intelligently by time intervals for the performance of hybrid tables and time-filtered data. When generating offline segments, the segments generated such that segments only contain one time interval and are well partitioned by the time column.