Well, why are you triggering on something that appears to have a completely rand...

seiji · on March 4, 2014

A typical nagios alert will fire if it hasn't been updated in X seconds. Sometimes the queue of incoming events gets backed up and nagios doesn't receive the results of service probes until X+5 seconds or 2X seconds later (due to internal nagios design problems, not the services actually being delayed).

So, nagios thinks "Service Moo hasn't contacted us in 60 seconds, ALERT!" when the update is actually in the event log, but nagios hasn't processed it yet.

blueskin_ · on March 5, 2014

I haven't seen this in ~1k services, but I guess it probably depends on the spec of the monitoring system to some degree, and I realise that 1k+ hosts is likely a different story. If you're using passive checks in any high-rate capacity, you should be using NSCA or increasing the frequency they are read in anyway. This is also another problem Icinga handles better - while I say Nagios for convenience's sake, my comments here refer to Icinga (and to Nagios XI, which is comparable but stupidly expensive).