Given that a large swath of SaaS services, infrastructure providers, and major sites across the internet are impacted, this seems harsh. Are you unhappy with PagerDuty's choice of DNS provider, or something else they have control over? I don't think anyone saw this particular problem coming.
A company that bills themselves as a reliable, highly available disaster handling tool ought to know better than to have a single point of failure anywhere in its infrastructure.
Specifically, they shouldn't have all of their DNS hosted with one company. That is a major design flaw for a disaster-handling tool.
I'm not using the service, but I'm curious what an acceptable threshold for this company is. Like, if half the DNS servers are attacked? If hostile actors sever fiber optic lines in the Pacific?
I ask because my secondary question, as a network noob, is was anybody prepared / preparing for a DDOS on a DNS like this? Were people talking about this before? I live in Mountain View so I've been thinking today about the steps I and my company could take in case something horrifying happens - I remember reading on reddit years ago about local internets, wifi nets, etc, and would love to start building some fail safes with this in mind.
I'm not using the service either, but I noticed this comment [1]. It's not the first time that a DNS server has been DDoS-ed, so it has been discussed before (e.g. [2]). At minimum, I would expect a company that exists for scenarios like this to have more than one DNS server. Staying up when half of existing DNS servers are down is a new problem that no one has faced yet, but this is an old, solved one.
Namely "Uninterrupted Service at Scale -
Our service is distributed across multiple data centers and hosting providers, so that if one goes down, we stay available."
It seems fair to expect them to have a backup dns too, but I am not an expert.
From the perspective of my service being down, my customers being pissed, and me not being notified.. yes, maybe PD should be held to a higher standard of uptime. Seems core to their value prop.