Is it just me that thinks it a bit wrong that they don't know/care how many db servers they run? If I was running something like that I'd want to know that number.
at their level, it's like you worrying how many meg your apache server is taking. I understood in the post that they have an auto-scaling tool that starts, mounts and ends (when appropriate) databases servers
I've worked in some environments with pretty massive databases... Every one has known how many servers they have (and the overall capacity). Most have some pretty serious capacity measurement, monitoring and planning methods.
So yeah, I still find it a bit odd. Each to their own I guess.
Sure, I guess if money isn't an issue and you just want the job done, that's what you do to scale.
As long as the fully optimized to start with, and continually monitor that...
I think that you definitely want an architecture that can handle random failures of database servers. Thus you never know exactly how many you have at a specific time. That's the whole point.
I can understand that with a bank of hot spares or use of a service like EC2 or wild traffic swings, the exact number at any one moment becomes less relevant. But it would be nice for them to have given a historical average (and even, typical variance), based on their logs of actual machines dynamically brought into or out of service.
Let's discuss the value of releasing information like this besides the usual intellectual curiosity?
Does this sharing help the Digg community grow stronger or larger? Or is this meant for potential investors to show they are 'alive and well' behind the curtain?
I think it's a blog post. Sometimes just answering a common question is enough justification for that. But it also probably contributes to both of your points, and gives employees a feeling of contributing to the public face of the company.
So they have master servers and replication. Not to mention they've covered up for MySQLs lack of ability to kill heavy queries automatically by making some hacky perl-script.
Something tells me using MySQL didn't really help them much, as all serious DBs out there have this shit covered by default.
In fact everything mentioned in this article is stuff I would have with SQL Server Standard Edition, and that's stuff that's been around at least since 2000.
Disclaimer: I'm somewhat of a DB-purist and I hate seeing things like this encouraged or praised, when I consider this reinventing a old, old wheel for the millionth time.
Just curious, but how many really high volume websites do run on MSSQL?
MySQL might not be all singing and all dancing but being able to handle load by throwing free boxes at it must be nice, and using MSSQL server in an environment where you don't even know how many databases you have could be somewhat expensive.