This is a great paper. If you haven't read it, it suggests a common scenario where endemic network delays tend to nudge all participants in a periodic broadcast protocol to send their broadcasts at the same time, so that some hours after you start all the participants, everyone has synchronized and on a timer saturates the network with updates.
The solution (I didn't reread so this is from memory) is to add random jitter to each participant's timer.
However, is there evidence to suggest that's what happened to Amazon? I can see this being a big issue in '93 with high-latency low-bandwidth links a commonplace. But we think that Amazon wasn't engineered well enough to deal with multiple orders of magnitude spikes in C&C traffic?
Thank you, though, for posting a (much needed) technical comment to this discussion.
I don't think it was a symptom of routing synchronization specifically, but I'd be curious to know if it was a case of unexpected and undesired synchronization. (E.G. An independent and random cluster of blocks suddenly updated; the network was saturated; it pulled in more updates; ...)
And yes, the paper talked about randomization. It also pointed out the magnitude of randomization required was larger than expected.
I posted this yesterday, with the conjecture that it may have been a sudden sync problem. It's a good read.