Disclaimer: I'm a founder at Gravwell, a log analytics startup
I agree, even when applicable LLMs are relegated to analyzing subselected data, so logs have to go somewhere else first. I think understanding logs is brain intensive because it can be a tricky problem. It gets easier with good tools, but often those tools are the kind that need to be used to build something else that solves the problem, rather than solve the problem themselves (e.g. building a good query + automation). I think LLMs can get better at creating the queries which would help a lot.
We started Gravwell to try bring some magic. It's a schema-on-read time-series data lake that will eat text or binary and comes in SaaS or self-hosted (on-prem). We built our backend from scratch to offer maximum flexibility in query. The search syntax looks like a linux command line, and kinda behaves like one too. Chain modules together to extract, filter, aggregate, enrich, etc. Automation system included. If you like Splunk, you should check us out.
There's a free community edition (personal or commercial use) for 2GB/day anon or 14GB/day w/ email. Tech docs are open at docs.gravwell.io.
Hey snisarenko, you could use the Gravwell Community Edition which is free up to 2GB/day. 4GB/day if you participate in the current alpha testing program. https://www.gravwell.io/download
Disclaimer: I'm one of the founders. We built it to be a Splunk alternative and I think it does a great job from a data ingest, scalability, and data querying perspective. We're lacking some of the out-of-the-box capability, but not for long. Kits (our "apps") release is coming this quarter.
We haven't published a lot on the backend architecture, but there's some info in the docs. I think it would be interesting to chat. Can you hit me up at info@gravwell.io?
I get what you're saying. I don't think you're wrong. This isn't currently a priority for us but I hope that someday someone builds something like that. That's one of the aspirations of releasing a Community Edition. Our API docs are open: https://dev.gravwell.io/docs/#!api/api.md
Its a bit of a different paradigm. From a deployment perspective we are a few static binaries and we don't require that you fully understand your data prior to ingesting and operating on it. The storage system is a bit different too, in that it treats storage as a cost center (e.g. use expensive storage when you want speed and ageout and optimize when you need longevity. A short answer is that we are truly unstructured, and will handle a lot of data in its native form.
To your second question, our entire system is built around processing local, copying is a HUGE no no in the platform until you absolutely have to. We're VERY happy with the performance we're getting out of this sucker. That's one of the primary reasons we were dumb enough to start from scratch on the storage and search architecture. We're just glad it paid off.
It's easy to deploy to the cloud and right now we're doing that on a customer-by-customer basis. As we grow I could see turning it into a formal SaaS offering or finding a partner to do so.
Shit, I know the frustration. This one's on me. I'm Corey, one of the co-founders. Pricing is a big "it depends" but average cost savings over Splunk are 30%. For starters, a single node Basic unlimited data license is $25k annually. Things obviously go up for larger enterprises that need bigger clusters but our pricing model is a step function rather than the bullshit "pay per GB" model that everyone else seems stuck on. Our view is, it's your hardware so use it. Every license is unlimited data. You only add more nodes if you need more phatty IOPs for fast searches - hence the it depends.
Some people use it more like a black box and don't issue many searches so a single node with a shitload of storage is just fine. Others rely on active searching to monitor security incidents and KPIs so they want responsiveness and immediate insights.
Hey - no worries, and I appreciate the forthright (and detailed) response. I Show HN'd something the other day and got beat up a little too, I know how it goes. ;)
I typically use AWS's built-in tools, but I've been looking for something for home, so I'll be checking this out. Thanks.
I agree, even when applicable LLMs are relegated to analyzing subselected data, so logs have to go somewhere else first. I think understanding logs is brain intensive because it can be a tricky problem. It gets easier with good tools, but often those tools are the kind that need to be used to build something else that solves the problem, rather than solve the problem themselves (e.g. building a good query + automation). I think LLMs can get better at creating the queries which would help a lot.
We started Gravwell to try bring some magic. It's a schema-on-read time-series data lake that will eat text or binary and comes in SaaS or self-hosted (on-prem). We built our backend from scratch to offer maximum flexibility in query. The search syntax looks like a linux command line, and kinda behaves like one too. Chain modules together to extract, filter, aggregate, enrich, etc. Automation system included. If you like Splunk, you should check us out.
There's a free community edition (personal or commercial use) for 2GB/day anon or 14GB/day w/ email. Tech docs are open at docs.gravwell.io.