Not sure why you got voted down, so I'll vote you back up and answer your totally valid question.
rrdtool is fine if you know your schema in advance, but that is somewhat of a hinderance because you don't always know it or want to have to define it.
Graphite is great because it listens on a port an accepts data in a simple unstructured format. The format is: key : value (long) : timestamp
Where key can be in dot notation so that you can have your data in a 'tree' like structure. This is a made up example: "ehcache.activemq.joins" or "ehcache.activemq.quits". The dashboard can then give you a nice tree to navigate through your data so that you can pin down specific areas that you'd like to graph. It also allows you to easily mix and match areas as overlays.
Well, you can use graphite as a frontend to your RRD files if you like. I'm using it to view data gathered by collectd and am planning to throw some RRD files generated by jmxtrans at it soon.
If you were writing a tool that recorded time-series data, there's some explanation of why you might choose whisper over rrdtool at http://graphite.wikidot.com/whisper. I don't know of any other projects thats use whisper directly, though, and the creator of whisper is working on a replacement for it named ceres (http://graphite.wikidot.com/roadmap#toc0). The primary appeal is of the carbon daemon, which lets you send data to it over a super-simple protocol and takes care of persisting it for you.
Given that jmxtrans has native support for writing directly to Graphite, why are you using it to write to rrd files?
If you want those rrd files for other tools, the beauty of jmxtrans is that you can have multiple outputwriters so you can write to rrd AND graphite at the same time, with no loss in performance, since it happens in a multithreaded environment.
Also note, writing to rrd files from jmxtrans is terribly inefficient. It spawns the rrdtool to do it because (unfortunately) the java implementation of rrd outputs files which are not compatible with rrdtool.
(I'm the author of jmxtrans. Thanks for using my stuff!)
Graphite is a great tool for quickly seeing what is going on. Generating the graphs was always a bit of a pain point for it. I must say, this is an amazing contribution. Thanks so much.
(I'm the author of http://jmxtrans.googlecode.com which allows you to very easily tie together Graphite & Java Management Extensions (JMX) for monitoring all of your JVM's.)
Another alternative is to consider a service that handles storage/visualization/alerting/etc for your time-series data.
I work for a startup that does exactly that and there are other options in this space as well. Would love any feedback you guys might have: https://metrics.librato.com
I'm so sorry, but I really don't see the value in spending $26.78/month for only 50 metrics. I could install Graphite on a small instance at AWS and get nearly unlimited metrics for a fraction of the cost.
Never mind the fact that if I point your tool at a JVM, I can quickly get over 50 metrics (and double my cost to $53.57 since your slider only goes in increments of 50) just by looking at a single ehcache instance.
Yes, your service is nice, your graphs are pretty, but at the end of the day, I think I'd have a hard time convincing my boss (me) that this is a valuable thing to spend a lot of money on.
It seems like a lot of these 'monitoring' companies are springing up these days. I feel bad for you because I see that industry quickly being commoditized into who offers the lowest price. It is also not the easiest problem to solve because of the data storage and availability requirements.
Anyway, I don't mean any ill will. I wish you the best in your business, I just don't see how it would work for me.
I appreciate you taking the time to look over our marketing content and giving your frank opinion :-), no ill will perceived at all. It helps us to know where we need to improve our communication around the value we are providing. In that spirit, I'll take a brief stab addressing some of your concerns here:
1) Your time is the most expensive resource. The cost of hosting your own solutions is almost always dwarfed by the cost of time you spend configuring it, maintaining it, and recovering it in the face of failures. We provide the same value here as any SaaS team in any vertical. We care and manage for the infrastructure and are constantly developing and rolling out new features. Of course the time needed to invest depends on one's time and experience, but we intend to save a lot of people a lot of time.
2.) A small EC2 instance costs $61/month and has finite disk bandwidth (and CPU). You might get more than 50 metrics, but it's going to come up a lot short of "nearly unlimited". Most people I know running serious Graphite installations end up needing collocated physical hardware with SSD's. That's going to cost you more like $1K-$2K/month. You will still have a SPOF unless you double that cost. We handle all the scaling and reliability for you.
3) Our pricing is completely linear per the number of metrics, the steps in the slider are just to make it easier to chunk around different numbers. There are no step-wise increases that double your costs. I appreciate your comment here as it had not occurred to me that someone might (reasonably) infer a stepwise increase in pricing.
4) We also include other valuable tools like threshold-based alerting on all your data streams with GUI integration to 3rd party services like Campfire, PagerDuty, Email, Custom Webhooks, with more to come.
5) A lot of 'monitoring' companies are springing up these days because the market is clamoring for it ;-). While some teams would rather handle these things in-house, a lot of other teams would rather focus on building their core business value and out-source infrastructure head-aches. It's the same economics pushing teams to outsource version-control, logging, hosting, etc.
Thanks! One more thing, right after hitting reply, I realized the numbers you saw sounded too high. So I checked out our pricing page and it turns out we had a regression in the estimator. 50 metrics reported every 60 seconds costs $4.46, not $26.78. For the $26.78 you can push 300 metrics every 60 seconds. Just pushed out a fix and it shows the correct values now, so this exchange was really helpful!
I myself was skeptical when reading OPs numbers, but if I can get reliable metrics like that for just $4.46 a month I might even consider using your service! You should definitely edit your post above to reflect that the price was too high for the numbers quoted, as maybe not everyone will read this far in the thread to discover it.
Wow, that makes a HUGE difference in price. Now it makes a lot more sense and leaves me wondering about your QA practices. Just pulling your chain. ;-)
Anyway, the fact that you are responding here and being active would definitely make me feel like your company would be the first to check out. This is similar to how I feel about WePay as well. Their ceo and engineers read HN and respond to comments. It is the dawn of a new era of access and support.
Your service actually looks very interesting. Would you be able to tell us about your backend? How are you guys making it real-time? And how are you handeling storage?
Sure thing! Our backend is horizontally scalable and triply redundant by design. We have a web tier behind an ELB that runs our API (http://dev.librato.com/v1/post/metrics). It's implemented in Sinatra today. We'll probably switch to something more performant in the future, but for now it's cheaper to throw more instances behind the load-balancer and spend our time building out more features in the service.
When metrics come into one of the API instances, they turn around and insert it into a Cassandra cluster that we're running spread across 3 different availability zones with an RF factor of 3, meaning your data is stored in 3 different availability zones.
The "GET" calls to pull data out of our service go through the same API, so as soon as data is written to the cluster, it can be pulled back out. Hence the marketing term "realtime".
As this post recognizes, there are a lot of components bundled under the name Graphite:
* There's whisper, the file format carbon uses to store time-series data.
* There's carbon, the daemons that accept data over the network, combines them, and writes them to whisper files.
* There's graphite, a django application that can read data from carbon files or RRD files. Graphite features several user interfaces of its own as well as an API to render the data as graphs or as numerical values.
What I find most interesting is graphite's API's potential as an intermediary service between your metrics storage (whether its RRD, whisper, or some other format that you add support for) and your applications that need to consume those metrics (e.g. your monitoring system, your dashboards).
I've been working on a less-configurable sort of graphite dashboard for use as an ambient display- https://github.com/potch/statsdash. Looks great on a tablet.