Could this be used to manage ingesting data from messier sources? Ala files (pdf...

obi1kenobi · on Feb 9, 2023

To be completely honest, I was planning on doing a "Show HN" in a month or two, but someone posted the link today and it caught me a bit off-guard -- that's why the docs and examples aren't there yet :)

There's definitely a bit of a learning curve because the docs are few and far between, but in general, everything is much simpler than it seems. The people that have adopted it already have been able to do so within a couple of hours with minimal guidance from me.

For passing filters to the data source, the API there is still very experimental (it was just an idea in my head until last week!) but I've been working on it recently. Here's a blog post showing a small bit of that new API and using it to dramatically optimize a workload: https://predr.ag/blog/speeding-up-rust-semver-checking-by-ov...

This is my full-time project right now, so stay tuned and expect a decently steep progress trajectory :)

unshavedyak · on Feb 9, 2023

I'm definitely interested, will give it a try!

obi1kenobi · on Feb 9, 2023

Please feel free to open issues on the repo whenever anything is unclear and I'd be happy to help out!

obi1kenobi · on Feb 9, 2023

Yes, in at least two ways:

- It could be used as the "ingestion process" itself: write the ingestion as a Trustfall query (or queries) over the data sources, and store the results in a traditional database or other system.

- It could be used to make the "ingestion" a mere implementation detail: you could write a query over all the data sources, and that query could be executed against the raw data sources themselves, or some "ingested" format which would presumably be faster and more efficient. The query can't tell the difference, and this frees up the ingestion system to evolve independently of how it's being used.

It sounds like you might have a specific use case in mind? I'd love to learn more about it if so! My Twitter DMs are open, or I could give you my email address if you'd prefer that.