And Splunk is horrendously slow in my experience.

sethammons · on April 13, 2020

We self host splunk and it can plow through petabytes of high cardinality data pretty dang fast. If the fields are not indexed and the search is complex, it can take minutes or hours. But usually, I can get live and historic data in a few seconds.

As an example, we have a pipeline of services. I can compute the time spent in each service with multiple levels of percentiles and group that data by high cardinality fields (as in, hundreds of thousands or more values). I just did a search for 4 hours of data across thousands of nodes for half a dozen or so services with multiple eval statements all piped to a timechart doing over a dozen stats operations. Half a billion events. It got done in under a minute.

Splunk charges so much because they are just so dang powerful.

wenc · on April 13, 2020

This has not been true in my experience. I run a Splunk server in production and at my data volumes it has been very performant. It's also much easier to setup and maintain than ELK clusters.

In the early days Splunk pricing was exorbitant (we evaluated Splunk 7 years ago and dismissed it), but licensing has changed in recent years and it is now priced by volume ingested (the pricing is transparent and listed on their website now). At low volumes, the pricing is similar to Sumologic, and is pretty accessible now to smaller dev shops. Open-source collectors like fluentd also help to intelligently reduce the ingest volume.

At high volumes, the TCO changes quite a bit.

GiorgioG · on April 13, 2020

My experience is solely at work where we use Splunk Cloud and it's slow as molasses.

dzimine · on April 13, 2020

Does the speed matter? How exactly? I am genuinely curious: at Scalyr we _can_ be very fast but it is a balance with the cost that we want to pass on as price savings. Same with self-hosted Elastic: one can fine-tune it to be fast but minding the cost constraints gets it slower. WDYT?

m0xte · on April 13, 2020

Same experience. Also splunk forwarders appear to be universally unreliable.

sethammons · on April 13, 2020

we have extra monitoring for splunk forwarders, and even then we fail to notice them fail from time to time. Sigh.....

m0xte · on April 14, 2020

Yes same here. I actually monitor the throughput of the network interfaces on our forwarder with prometheus/statsd_exporter and if outbound is smaller than inbound it sets off alerts!

wenc · on April 13, 2020

Ours is self-hosted and is plenty fast. This might be something to bring up with the Splunk folks. Maybe their cloud side of things needs tuning.