elza_1111's comments

elza_1111 · 2025-07-24T17:48:37 1753379317

Hi HN, author here. TL;DR: I wrote this because I believe the hype around AI agents in observability is getting ahead of reality. After building an MCP server for our observability backend, I'm convinced they are powerful hypothesis generators, but not yet reliable problem solvers.

After reading a few articles claiming MCP would be the "end of observability," I felt the need to write down my own, more sceptical take, based on my experience building one of these systems.

My core argument is that these tools are effective at identifying known failure patterns, but they struggle with novel issues. During a high-stakes incident, the risk of following a confident-sounding LLM hallucination down a rabbit hole is dangerously high. Verifying the AI's suggestions can often be just as much work as finding the root cause yourself.

elza_1111 · 2025-07-22T14:34:46 1753194886

I would look at it from a demand-supply perspective. The demand will significantly reduce, and the supply has increased. I also love to look at it from a survival of the fittest perspective. If you are genuinely good at what you do and can drive exceptional results, you don''t have to worry. But if not, you might have to find something where you can "actually provide value" and not just be a fly on the wall.

elza_1111 · 2025-07-22T14:25:22 1753194322

Hi HN, author here.

TL;DR: I wrote this because I believe the hype around AI agents in observability is getting ahead of reality. After building an MCP server for our observability backend, I'm convinced they are powerful hypothesis generators, but not yet reliable problem solvers.

After reading a few articles claiming MCP would be the "end of observability," I felt the need to write down my own, more skeptical take, based on my experience building one of these systems.

My core argument is that these tools are effective at identifying known failure patterns, but they struggle with novel issues. During a high-stakes incident, the risk of following a confident-sounding LLM hallucination down a rabbit hole is dangerously high. Verifying the AI's suggestions can often be just as much work as finding the root cause yourself.

Ultimately, I see these agents as a co-pilot that can brainstorm, but can't yet be trusted to fly the plane.

Curious to hear from other SREs and developers: how are you really using these tools? Are you finding them reliable for RCA, or are you also spending significant time manually verifying their "confident" suggestions?

elza_1111 · 2025-07-19T14:39:24 1752935964

As the original author, some things that I could've potentially included to make it a more complete guide is, - how to collect new telemetry alongside KPS - showcase and correlate application level metrics along with infra in a single-view dashboard maybe? -include the Operator way as well

Anything more to add? Trying to really make this a one-stop guide.

elza_1111 · 2025-07-18T05:05:36 1752815136

There are worse ways to start your day than sitting on still water in a boat that doesn’t want you there, trying to move forward anyway. Turns out that’s a decent metaphor for most things.

Hits home

elza_1111 · 2025-07-15T06:17:10 1752560230

Oh man. Peak evolution

elza_1111 · 2025-07-07T07:44:39 1751874279

MCP in itself is not a widely adopted protocol. Observing such systems is a far cry..

elza_1111 · 2025-06-15T11:22:06 1749986526

yep, SigNoz is OpenTelemetry native. You can instrument your application with OpenTelemetry and send telemetry data direclty to signoz.

elza_1111 · 2025-06-15T11:10:04 1749985804

FYI for anyone reading, OTel does have great auto-instrumentation for Python, Java and .NET also

elza_1111 · 2025-06-15T10:56:27 1749984987

There are integrations that let you monitor your AWS resources also on SigNoz. That said, I personally think CloudWatch is painful in so many other ways as well,

Check this out, https://signoz.io/blog/6-silent-traps-inside-cloudWatch-that...