Think of SLAs as "this is how hard we'll scramble when shit hits the fan". Excep...

chrsig · on April 13, 2022

It's more "this is our contractual obligation, if we're down more than this, then we might not charge you"

dylan604 · on April 13, 2022

Lawyers are involved, so I'd assume some text about "excluding acts of god, sabotage,etc" to weasel their way out of things. They might even be able to get away with "acts of incompetence" how ever a lawyer might phrase that to allow their client to weasel.

TheCoelacanth · on April 13, 2022

SLA credits are a thing that actually happen in the industry. I wouldn't automatically assume that they will be able to weasel out of it.

They are typically limited to the amount that you actually paid, though, so basically they don't charge you for the time when you couldn't use the product. You usually won't get more than that.

mywittyname · on April 13, 2022

That's a good way to get executive approval to replace a system. Google or Apple can get away with this kind of behavior, I doubt Atlassian can.

This outage alone has spurred conversations in slack about how terrible JIRA is and why we should replace it. If this kind of shit was pulled, I can guarantee we'd be on shortcut, linear, or something else in short order.

MajorBee · on April 13, 2022

> Google or Apple can get away with this kind of behavior, I doubt Atlassian can

Atlassian absolutely can in enterprise settings. In my company (a large cloud company), if JIRA goes down, large swathes of the business will also stall, including code deployment (deployments are tracked through change management JIRA tickets). We also use the DC version of Atlassian products, so presumably we aren't be at the mercy of Atlassian cloud engineers.

echelon · on April 13, 2022

In some industries, three nines isn't exactly stellar. Every service I've worked on recently has demanded five nines of uptime and tons of reporting on latency and even seconds-long outages.

I've been on-call during a total infrastructure outage whose root cause was a service my team owned [1]. Our CEO was aware of it. Customers and business partners were aware of it. Other CEOs were aware of it. The media, you name it.

Some outages can be "business ending" or "business damaging". That's why we made a practice and process of performing regular disaster recovery exercises, had exceptionally well documented runbooks, had monitoring attached to everything, and engineered for resilience.

Though I'm not familiar with how Atlassian runs, I think this is an "engineering culture" thing or can be mitigated with a proper approach.

[1] The company has only had a few of these in total, and no member of our team was culpable for the complicated failure.

mmcgaha · on April 13, 2022

I think of SLAs as how do we design this thing. Ask for a system without an SLA and I will give you a system that is well designed and almost never goes down. As soon as you ask for an SLA, I will give you an over engineered system that costs more, takes longer to implement and is slower to iterate but it will almost never go down either.

krinchan · on April 13, 2022

Per the article, if you experience < 95% uptime in any 30 day window you qualify for a 50% discount. On a month or your next year or ... ? it doesn't say.