Hacker Newsnew | past | comments | ask | show | jobs | submit | madman2890's commentslogin

with all of it? wow :)


We need the government to release the alien technology to the public now.


Glad you find it useful.


Tabular foundation models like TabPFN and related work are extremely promising. They’re starting to show strong results on many classical tabular ML benchmarks and can reduce the amount of manual modeling work required from data scientists. However, there is a structural reality of enterprise data that these models don’t remove. Most real-world machine learning problems are not stored in a single clean table. Instead they live across dozens or hundreds of relational tables: orders, customers, events, transactions, shipments, products, logs, etc. Each table captures part of the signal, often with one-to-many relationships, time dependencies, and high cardinality entities. Before any tabular model can be trained, those signals have to be integrated. In practice this means: Traversing relational graphs of tables Aggregating child tables to parent entities Handling time windows and temporal leakage Collapsing many-to-many relationships into meaningful features Producing a single wide training dataset This step is usually the most time-consuming part of the entire ML workflow. Even if the model itself becomes automated via a tabular foundation model, the data still has to be prepared. This is where GraphReduce comes in. GraphReduce treats the relational database as a graph of entities and relationships. Instead of manually writing large SQL pipelines, the user defines the nodes (tables) and their relationships. GraphReduce then walks the graph and performs the required aggregations automatically, generating a single training dataset.


“Smart developer’s quirks” tend to peak in 3-8 years of experience and fade off thereafter. A hipster will never fade off and instead continue hipster coding alongside their identity in perpetuity.


Way overdue, but finally got the docs up for my project graphreduce. Feedback, contributions, and stars welcome!


In data pipelines or MLOps projects I've found that mirroring the code location in the artifact storage location, such as in object stores, to be a helpful pattern. This little lightweight library makes using this pattern a drop in for any project.


Taking numerous raw tables to a flat ML/AI ready feature vector can be a challenge. This is a lighter weight way of doing it than feature stores, if it may serve you!


I've found feature stores to be overkill for many projects, as I just need point-in-time correctness, abstractions for joining lots of tables and flattening to a specific granularity, and abstractions for composability. I've been building GraphReduce which helps with all the aforementioned. Hope it serves others!


An early release of the tech - an automation layer for discovering inclusion dependencies, building a graph of data where tables/files are nodes and foreign keys are edges between them.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: