Linux System Monitoring Tools

daten · on Nov 15, 2010

I suggest dstat over vmstat, it has color coded output and abbreviates units automatically. It's easy to add columns or monitor specific devices or interfaces.

http://dag.wieers.com/home-made/dstat/

I suggest OpenNMS as a cacti and nagios alternative. It eliminates most of the manual configuration. It can automatically detect nodes and services and if you give it SNMP information it can monitor specifics of each machine. I've used it to monitor hundreds of machines but it can be resource intensive.

http://www.opennms.org/

iftop is also a nice lightweight alternative to iptraf and helps track down bandwidth heavy processes and connections.

http://ex-parrot.com/pdw/iftop/

dmytton · on Nov 15, 2010

These tools are great for looking at what's happening now if you're logged into the server.

They're complemented by monitoring products like:

Self hosted:

- Nagios (already mentioned in the post)

- Cacti / Munin

Hosted:

- http://www.serverdensity.com (tool my company produces)

- http://www.cloudkick.com (monitoring + cloud infrastructure management)

- http://www.scoutapp.com

these give you similar metrics plus various other things like alerting, graphs, mobile apps, etc.

josephruscio · on Nov 15, 2010

http://librato.com is another hosted product (disclosure: I work on this) for systematically monitoring/managing applications.

Throlkim · on Nov 15, 2010

Does anybody actually use top, rather than htop? It's the first thing I install on every system I build.

Something I've become very fond of recently is Monit, which doesn't appear to be on the list. I've found it very reassuring to have Monit set-up and watching the processes on my server.

stavros · on Nov 15, 2010

I came here to say the same thing. htop is streets ahead from top.

seiji · on Nov 15, 2010

atop is fun too

geoka9 · on Nov 15, 2010

Another one I find quite useful is iotop:

http://guichaz.free.fr/iotop/

Very handy to quickly see what process is causing that disk thrashing, for example.

fossguy · on Nov 15, 2010

For people that care about security, I would add those monitoring tools:

-OSSEC - log + file system security monitoring (http://ossec.net)

-Snort - Network-based IDS (http://snort.org)

-Sucuri (not free) - web site monitoring (http://sucuri.net)

zoomzoom · on Nov 15, 2010

I have also seen Munin, which provides robust monitoring.

tszming · on Nov 15, 2010

ps_mem.py - Determine how much RAM is currently being used per program, is useful when top command failed to report actual memory shared due to copy-on-write among multiple processes.

http://www.pixelbeat.org/scripts/ps_mem.py

http://wiki.apache.org/spamassassin/TopSharedMemoryBug

jazzyb · on Nov 15, 2010

In addition to tcpdump, I'd like to add the command 'tshark'. Tshark usually comes bundled with wireshark and allows you to use the same search capabilities as wireshark from the command line. I find it much easier to use than tcpdump especially if you already have experience with wireshark.

pixdamix · on Nov 15, 2010

When it comes to wireshark and remote servers i often do this:

    ssh root@someserver "/tmp/tcpdump -i any -p -s0 -w - not port 22" | \
        wireshark -i - -k

jsaparov · on Nov 15, 2010

I often use basic command line tools (vm/io/snmpstat, fiddle with /proc with cat/cut) and chart the results along the way, in realtime, with this little tool: http://freshmeat.net/projects/trend

cagenut · on Nov 15, 2010

Wait, you don't just look at the load average?

daten · on Nov 15, 2010

After you see that you have a high load average, these are the tools you would use to track down why.