Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How well does the expression detection scale with number of columns? If I am reading Table 4 correctly FastLanes is ~10x slower at encoding than parquet+snappy (which seems a reasonable tradeoff for the better compression and scan times) but how is that affected for very wide tables (eg 2k columns or something like that)


That’s a very valid question. We’ve done zero optimization on the encoding side so far, and improving that is definitely on our roadmap. Technically, once we learn the best expressions, they can be reused — data is often very similar across row groups — which opens the door to caching and amortizing the cost.

For very wide tables, expression detection only needs to happen once. Beyond that, we’re also exploring techniques like grouping columns into smaller sets or applying more aggressive heuristics to prune irrelevant columns. These are areas we’re actively investigating, and we plan to support them in future versions of FastLanes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: