More

madman2890 · 2026-03-09T19:04:41 1773083081

with all of it? wow :)

madman2890 · 2026-03-06T16:44:49 1772815489

We need the government to release the alien technology to the public now.

madman2890 · 2026-03-05T15:02:07 1772722927

Glad you find it useful.

madman2890 · 2026-03-05T03:09:26 1772680166

Tabular foundation models like TabPFN and related work are extremely promising. They’re starting to show strong results on many classical tabular ML benchmarks and can reduce the amount of manual modeling work required from data scientists. However, there is a structural reality of enterprise data that these models don’t remove. Most real-world machine learning problems are not stored in a single clean table. Instead they live across dozens or hundreds of relational tables: orders, customers, events, transactions, shipments, products, logs, etc. Each table captures part of the signal, often with one-to-many relationships, time dependencies, and high cardinality entities. Before any tabular model can be trained, those signals have to be integrated. In practice this means: Traversing relational graphs of tables Aggregating child tables to parent entities Handling time windows and temporal leakage Collapsing many-to-many relationships into meaningful features Producing a single wide training dataset This step is usually the most time-consuming part of the entire ML workflow. Even if the model itself becomes automated via a tabular foundation model, the data still has to be prepared. This is where GraphReduce comes in. GraphReduce treats the relational database as a graph of entities and relationships. Instead of manually writing large SQL pipelines, the user defines the nodes (tables) and their relationships. GraphReduce then walks the graph and performs the required aggregations automatically, generating a single training dataset.

madman2890 · 2025-08-30T16:26:58 1756571218

“Smart developer’s quirks” tend to peak in 3-8 years of experience and fade off thereafter. A hipster will never fade off and instead continue hipster coding alongside their identity in perpetuity.

madman2890 · on Sept 7, 2024

Way overdue, but finally got the docs up for my project graphreduce. Feedback, contributions, and stars welcome!

madman2890 · on March 8, 2024

In data pipelines or MLOps projects I've found that mirroring the code location in the artifact storage location, such as in object stores, to be a helpful pattern. This little lightweight library makes using this pattern a drop in for any project.

madman2890 · on Nov 19, 2023

Taking numerous raw tables to a flat ML/AI ready feature vector can be a challenge. This is a lighter weight way of doing it than feature stores, if it may serve you!

madman2890 · on Aug 8, 2023

I've found feature stores to be overkill for many projects, as I just need point-in-time correctness, abstractions for joining lots of tables and flattening to a specific granularity, and abstractions for composability. I've been building GraphReduce which helps with all the aforementioned. Hope it serves others!

madman2890 · on July 19, 2023

An early release of the tech - an automation layer for discovering inclusion dependencies, building a graph of data where tables/files are nodes and foreign keys are edges between them.