If your status page returns a 502 at the same time that your app returns a 502, you might be in trouble. https://status.discord.com/ really should be on a different domain. Of course the naked 502 from CloudFlare is enough to tell me that there is an outage, but overall this is probably not the ideal way of doing things. Domains are cheap. Get a status page with a different one (or a hosted service).
Some startup could do the `.status` Internet domain name TLD, plus offer an optional SaaS there.
(The SaaS could provide numerous different ways to push status to your branded `.status` domain page, plus optional watchdog/heartbeat. And the implementation could be be resilient against even a major cloud provider outage.)
Using dotStatus could be the no-brainer go-to for companies that know they should have an independent status page, or are required to have it, and don't want to spend a lot of engineering resources doing it right.
You meme, but having status updates on a widely distributed network already being used for other things would make it incredibly resilient. You can already sign messages without involving currency malarkey in most blockchains that matter.
I’d expect this would need to be like a heartbeat / dead mans switch, I.e keep flagging as up, and when you stop flagging, it’s considered down. Otherwise you would need a different status monitor to monitor the status agent…
Which is fine. Having a standardized .status domain for the pages, even if manually updated, is a big upgrade from the hodgepodge of what's out there now.
IPFS for pages like this is like a random webpage with extra steps - content on IPFS disappears if nobody has pinned it, so I don't think its benefits would be realized.
This is a great idea and caused me to go down a rabbit hole of learning what it would take to actually register a new gTLD.
Short version: The ICANN would need to do a new round of gTLD registrations. This is unannounced but rumored to begin some time this year.
Then, a corporation would need to apply for the gTLD string. This application cost $185,000, non-refundable, in 2012 [1]. Smart money is a steeper fee.
Next, they would actually need to be approved for the gTLD. This requires a company with enough funding to pass the ICANN's audit, and enough technical chops to maintain name servers with enough bandwidth and availability to serve the traffic generated by the new gTLD [2]. There is also a criminal background check and anti-cybersquatting check [3].
Finally, they would have to win in a contention process that involves any other registrar applying for the same string. This may be settled by an auction [4].
The buzz around the 2012 process is that it was generally a waste of time and money - many of the gTLDs far underperformed expectations by the registrars. Further, many of the corporate-owned gTLDs are unused (such as the 76! owned by Amazon)[5].
There were a few winners - the ICANN themselves made a hefty sum of money - some $212 million left over in 2020 [6]. Some other companies figured out the right way to "game" the contention process and intentionally lose the auction for the string [7]. Finally, a company named "Donuts, Inc", won some 270 gTLDs under various subsidiary companies. With such a massive portfolio, they were able to capture a few big winners (such as .guru) [8].
Given that the ICANN process is potentially long, the dotStatus software, infrastructure, and business-ing can be done in parallel.
And, have a backup/iterim plan of using a different domain name scheme. (Though `.status` would be awesome, and perhaps even -- dare I say it -- unicorn-scented?)
Whoever gets the funding for the dotStatus startup should bring on all the rest of us as advisors. :)
While i like the idea I must say, not going to happen. Companies want to have the full control over the status page. At least those company that have service level agreements with their customers.
So status.discord.com redirects to discordstatus.com - so it already is on a different domain. But the domain matters very little unless your DNS infrastructure is offline. However, then you have to assume your status site and domain have to have their own DNS. Unless you're really dedicated, you'll have some SPOF that can take down both (think even different domain registrars).
What most companies need is a solid status page that's hosted in very different infrastructure... that is very static or can scale very fast. Often the do the first... but not the second. If something like discord is down and 10 million daily-active-users all try and check the status page... they'll just DDOS that out of existence as well. Again, if not well built
> If your status page returns a 502 at the same time that your app returns a 502, you might be in trouble. https://status.discord.com/ really should be on a different domain
Subdomain can be hosted on a different server. You don't need another domain to do that.
No they don't. One domain can point to multiple name servers. Also, good name servers provide great uptime. If you are not using some no name company's name servers, but something like AWS route 53.
You can't avoid all dependencies. Certainly, avoiding a dependency on a complicated application server is good, but somewhere in the mix there has to be a DNS nameserver, an HTTPS server, some kind of persistent storage for the status, and some way for you to update that status. If your status page is a static page, then that just means that your persistent storage is wherever the HTML page itself is stored; an S3 bucket, say, or the filesystem of a machine that's running Apache.
The thing being pointed out in the OP is that, while you can't avoid all dependencies, you can avoid having any shared dependencies (other than core internet infrastructure) between your status page and the service whose status it's reporting on. That way, outage risk will not be correlated between the two, which is generally good enough, since almost no one cares about your status page when your actual service is not experiencing an outage. One effective way to do this is with a service like status.io that specifically hosts status pages and specializes in having very high uptime for just that one kind of page.
If you use the same domain for your status page as for your main service, then that may mean that an outage in your application server, application database, etc., won't affect your status page, but the two will still share a dependency on your load balancer (i.e., whatever your A records are pointing directly at), so if anything goes wrong there then your status page will go down with your main service. If you use a subdomain then there won't necessarily be a shared dependency on the load balancer, but there will be one on the DNS nameservers. The only way to avoid all shared dependencies is to use an entirely separate domain.
I don't know if Discord's problem that the OP is talking about had anything to do with DNS, but I think that's been a source of outages for them in the past, in which case a separate domain is the solution.
The behavior I observed in this particular case is that their server would timeout on requests, especially writes, then come back with a quick 502, then time out again. Their status page displayed the same kind of behavior and the same status page. I wouldn’t be surprised if the issue with the status page is that it might also even share the webservers with their main service.
You use multiple DNS servers for the domain - a common practice. (Its actually mandatory to have 2 which resolve to a different IP, but in reality this doesn't make them distinct servers, hence why i say "common" instead of "universal").
The point isn't that DNS is bad, but it is often neglected as a problem source, something taken a bit too much for granted.
The Kubernetes community loves this saying because pairing something like nodejs (which doesn't do DNS caching) with a Docker image with no local DNS cache/resolver and a Kubernetes cluster that doesn't usually come with a highly scaled DNS server out of the box often leads to DNS failing more often than anticipated. Also kube-dns, the now deprecated default DNS, was extremely shit.
I have a pcap file captured on a kubernetes node with 60000 DNS packets in a 0.2s frame. They're all queries for s3.amazonaws.com.
subdomain.example.com can be hosted on completely different infrastructure, but what about DNS? Unless you have two different DNS providers for the main domain, you still have single point of failure for the entire domain (and all subdomains).
I actually did this with Shipped Brain. We got our landing pang as https://shippedbrian.com and ou app as https://app.shippedbrain.com.
It made iterating over the pages much easier and also provides a clear separation of concerns.
According to https://discordstatus.com/, there isn’t an outage. Are you suggesting that there’s currently an ongoing outage, and their status page is useless? (Wouldn’t be the first time, but it’s surprising that an otherwise excellent engineering team would make such a mistake.)
Self reporting can be problematic if a company is looking to be acquired. I prefer to use 3rd party monitoring tools. The paid ones like 1k-eyes are great, but there are some free ones like DownDetector [1] that sometimes help paint a picture albeit not perfect.
DNS can be the cause of the issues. You can misconfigure the server, you can misconfigure NS glue, your registration can lapse, etc. If you have a second domain for your status page, it's less likely that a DNS problem with your primary domain will make that unavailable.
There is a tradeoff, of course. Your automated configuration management system can dutifully push the misconfiguration to both domains. You might not ever configure the second domain correctly, because it's not like customers are visiting and reporting back "hey your status page is broken".
In my mind, I'm not sure how useful this is in practice, however. I would never think to look for status updates at anything other than status.myapp.com. I'm not sure I'd make the leap to myappstatus.com, and I'm not sure it's going to rank particularly high in search engines. I'd probably look at Twitter second. (I'm not even sure if people routinely check status pages. If you visited my personal website, jrock.us, and it didn't load, would you check status.jrock.us? That actually exists, but I bet that nobody has ever visited it.)
Suppose the outage was caused by an issue with the registrar, DNS, the name server, or the domain registration itself. Having a separate domain for the status page — ideally purchased through a different provider – could prevent a lot of issues.
I believe there are ways to solve this entirely inside of CloudFlare by having them host pages or code snippets that render a "Sorry we're down" page in the event of an outage.
Robin Hobb is amazing, her books which begin with Assassin's Apprentice really helped me through extremely stressful periods of life being able from time to time to escape from reality and into another world and then return.
(The SaaS could provide numerous different ways to push status to your branded `.status` domain page, plus optional watchdog/heartbeat. And the implementation could be be resilient against even a major cloud provider outage.)
Using dotStatus could be the no-brainer go-to for companies that know they should have an independent status page, or are required to have it, and don't want to spend a lot of engineering resources doing it right.