Could this be used to manage ingesting data from messier sources? Ala files (pdf/etc), web pages, etc?
edit: Admittedly the website/video tends to talk about data sources a bit hand-wavy. I'd have loved some real world examples on how one goes about adding a data source. Also how we handle problems of scale.. ie passing filters to the data source, rather than drinking from a firehose and filtering after the fact.
With that said, the idea is becoming interesting to me. At the very least i am liking the idea of a standardized query interface to "things". Just feels like edge cases might drown me.
To be completely honest, I was planning on doing a "Show HN" in a month or two, but someone posted the link today and it caught me a bit off-guard -- that's why the docs and examples aren't there yet :)
There's definitely a bit of a learning curve because the docs are few and far between, but in general, everything is much simpler than it seems. The people that have adopted it already have been able to do so within a couple of hours with minimal guidance from me.
For passing filters to the data source, the API there is still very experimental (it was just an idea in my head until last week!) but I've been working on it recently. Here's a blog post showing a small bit of that new API and using it to dramatically optimize a workload: https://predr.ag/blog/speeding-up-rust-semver-checking-by-ov...
This is my full-time project right now, so stay tuned and expect a decently steep progress trajectory :)
- It could be used as the "ingestion process" itself: write the ingestion as a Trustfall query (or queries) over the data sources, and store the results in a traditional database or other system.
- It could be used to make the "ingestion" a mere implementation detail: you could write a query over all the data sources, and that query could be executed against the raw data sources themselves, or some "ingested" format which would presumably be faster and more efficient. The query can't tell the difference, and this frees up the ingestion system to evolve independently of how it's being used.
It sounds like you might have a specific use case in mind? I'd love to learn more about it if so! My Twitter DMs are open, or I could give you my email address if you'd prefer that.
edit: Admittedly the website/video tends to talk about data sources a bit hand-wavy. I'd have loved some real world examples on how one goes about adding a data source. Also how we handle problems of scale.. ie passing filters to the data source, rather than drinking from a firehose and filtering after the fact.
With that said, the idea is becoming interesting to me. At the very least i am liking the idea of a standardized query interface to "things". Just feels like edge cases might drown me.