Amazon is expanding their EC2 feature set so rapidly... The pace is mind blowing to me. Last year, Randy Bias estimated EC2 was pulling $220M revenue/yr:
And he estimated an overly conservative 10-20% annual growth. But given the EC2 buzz this year, and personal anecdotes from my friends and colleagues using it, my gut feeling tells me the 2010 revenues will have increased 50-100% over 2009 revenues.
Is EC2 profitable to Amazon? Likely very profitable if you want my opinion. It is well accepted in the industry that the dominant cost in large scale datacenters is power and cooling --not hardware, not human resources-- and I keep running numbers in my head and the hourly prices of all instance types are well above power & cooling.
Just as an example, we know that this new GPU instance has two 95W Xeon X5570 and two 247W Tesla M2050; assuming (1) a max TDP of 50W for the motherboard and rest of the server, (2) instances are run under full load 100% of the time and always reach these max TDP numbers (unlikely, but follow me for the sake of the argument), (3) Amazon uses servers with 80PLUS power supplies (80% efficient or more), (4) a rather good datacenter with a PUE of 1.3 (power usage effectiveness, which includes overhead from power distribution and cooling; numbers in the range of 1.2-1.4 are often quoted by James Hamilton from the AWS team: http://perspectives.mvdirona.com/), and (5) electricity costs of $0.10/kWh (average in the US, but I know Amazon datacenters are in locations with cheaper electricity), then the hourly power and cooling costs would be:
At an Amazon Tech Summit I was at in London this month, it was mentioned that Jeff Bezos expects AWS to eventually exceed the profits they make from retail.
Strange. I read James' blog (I even linked to it). And I remember posts demonstrating the reverse, that the cost of servers was not dominant in large-scale data centers, but dominant only in medium- and small-scale data centers (think typical enterprise data centers). James was using this data to explain why it made sense for enterprises to benefit from large-scale cost savings by moving to EC2.
Anyway, this does not change my point that the EC2 hourly prices are so high that they eclipse both power & cooling as well as server costs for Amazon.
It's even more impressive when you know Amazon started AWS to take advantage of excess capacity they had during the off-season when they weren't dealing with the crush of holiday orders.
Well, the Amazon Senior VP of International Retail said this was the case in his talk for the Stanford Entrepreneurial Thought Leadership series. So, seems there's some dispute within the company. But I suppose I should take Werner's word for it since he's the CTO.
It was premeditated and planned using all new capacity. I chatted with Chris Pinkham, who architected and was VP of Engineering for EC2. I also verified with Chris Brown, who was development lead for EC2 and Ben Black who was leading networking.
This is cool, but you know what would be even cooler? Instances with SSD storage. It's so annoying to have database queries run an order of magnitude faster on my MacBook Air than on a cloud server.
I don't know of any major provider that offers SSD instances. It really is an untapped market.
I don't know if they use SSDs, but Linode's London servers are absolutely screaming fast.
Linode:
Timing buffered disk reads: 230 MB in 3.01 seconds = 76.44 MB/sec
My local X-25M:
Timing buffered disk reads: 246 MB in 3.02 seconds = 81.48 MB/sec
This would suggest they are using SSDs (or fast RAID? I'm not sure if this benchmark is sequential access). I get comparable performance for the Georgia data center, but it varies greatly there, so I'm not sure what's going on.
EDIT: Sorry, apparently I can't tell my sde from my sda. My actual SSD performance is:
Timing buffered disk reads: 564 MB in 3.01 seconds = 187.47 MB/sec
I am using TRIM now (I assume), as I'm running the Ubuntu 10.10 kernel, which supports it on ext4. Before this, I was running 10.04, which didn't support trim. I'm not sure if I should clone the drive and reformat to get more speed, and I'm also not sure if there are alignment issues (I followed Ted Ts'o's advice on setting that up)...
Sorry if you have already done all this, but do you know if you are using the G1 or G2 X25-M? G1 doesn't have trim. G2 supports TRIM as long as you are not using the old firmware. So might be worth checking your firmware etc.?
Btw here is hdparm reading for my Western Digital Caviar Black (regular hard disk) just to give you an idea:
318 MB in 3.01 seconds = 105.56 MB/sec
I did upgrade my firmware a month ago, but I'm not sure which version I have, I'll check. However, now that you mention it, I do remember the disk doing 200ish mb/sec, maybe this was a fluke. It's almost empty, I use 20 out of the 160 gb, so I don't think it's a matter of allocated space...
Linode doesn't use SSDs (yet) - the founder has been spotted on their IRC channel in the relatively recent past mentioning the cost/performance isn't where they want it yet.
hdparm only does sequential reads, so don't expect a night-and-day difference between SSDs and rotating disks. Plus that benchmark reads out of the disk (buffer) cache, so you might really testing your memory speed.
> I don't know if they use SSDs, but Linode's London servers are absolutely screaming fast.
I think it's more likely they are using a high performance shared storage box. Those are really fast and reliable. The only reason I would use disks (whether SSDs or spinning metal) would be for speed. If, say, a storage box has 200 servers hanging from it and all 200 decide to go nuts with disk access you won't be able to sustain anything near 76 MBps per box.
As for the storage boxes themselves, they usually employ piles of ECC RAM, specialized network and disk controllers, SSDs, fast and small disks, larger and slower disks and keep moving data around trying to guess the optimal positioning to give you the best possible performance under your varying workload. It must be really cool to design one.
They would have to track wear and charge for the depreciation of a local SSD, since every newly started instance is going to want to overwrite much of the contents. It might make more sense to have SSD space available via SAN to any of your instances.
SSDs are cheap in terms of $/IOPS. In fact they are about a hundred times cheaper than HDDs (Crucial C300 128GB: $250 for 50k IOPS vs. entry-level 7200RPM HDD: $60 for 120 IOPS or so).
People, stop assuming that $/GB is the only metric that matters.
I wonder if $/GB matters when people are selling VMs (like say linode), I don't know how many Virtual Machines are put into one server but if you are doing 16gb per VM and depending on the RAID usage as well it might start to add up.
I also don't know if this is still a problem but there was a worry about limited writes on SSDs and if normal harddrive lifetimes are greater than that of an SSD by a significant amount I could see why it would be a problem.
Why can't they offer SSD instances at a higher price than non-SSD instances? Those that want or need the extra performance can pay for it just as they're paying for other niche instances already.
It's not exactly an apples for apples comparison, but with your 8 instances rocking 2 * 515Gflops of GPU each, you get just over 8Tflops. Looking back at the TOP500 lists, this "peak" value would have got you into the top 15 supercomputers in 2003. (Looking back further, you'd be vying for a top 5 spot in mid 2002..)
The more depressing observation is that 33.5ECUs are equivalent to 8 cores @ 2.93GHz on Intel's recent architecture. This means your typical "small" EC2 instance with 1 ECU is on a par with ~700MHz of a single modern Intel core. (Highly unscientific but an interesting ballpark.)
> > The more depressing observation is that 33.5ECUs are equivalent to 8 cores @ 2.93GHz
You said:
> I don't know where you got 700 MHz from, because by my math, it'd be equivalent to a 1.43 GHz Nehalem core (33.5 / 8 * 2.93)
Transposed.
>>> 8*2.93/33.5
0.69970149253731351
8 cores at 2.93 GHz is 23.44 GHz, which means one compute unit is 700 MHz.
A modern processor can do more per clock than an older processor. In addition, it has a larger cache, faster cache, and a faster memory bus, although on the flip side the memory bus is being shared between more CPUs.
Books from 20 years ago already advised to stop using FLOPs as an estimate of computational power. And for a good reason. I know you admitted it's not a fair comparison, but statements like this help marketing departments to do their job convincing the masses of "facts" that are highly unscientific.
It is useful to have some sort of measuring stick, however, and my comparison was more a cute attempt to show how this new offering compares with recent supercomputers than to seriously judge its capacity. Sadly Amazon gave us no other specs than FLOPs to go on..
If you really want to donate, there's always folding@home. Of course, a render farm can do the trick too if want purrty ray tracing? Look into Blender.
These machines hold their promises, they are extremely fast, the networking between the instances in fast, as is their connection to the outside world. I'm eager to see what people do with these.
And the pricing... to quote from the other article[1] on the GPU instances that's on the front page right now:
"An 8 TeraFLOPS HPC cluster of GPU-enabled nodes will now only cost you about $17 per hour."
does anybody know GPU instances can be of any aid for building full text indices (inverted lists) or other non-floating point workfloads ? I was skimming through the title of a recent paper presenting an sorting algorithm exploiting GPUs, but still I'm in the mental model of treating GPU workloads as having to do with floating point operations.
I'm not a GPGPU expert, but one thing that is easy to forget is that floating point types have a well-functioning integer subset. For example, on 32-bit computers it can be beneficial to use 64-bit doubles for extended precision in integer calculations. That said, when I've looked into potentially using GPGPU, the problem has been that branchy code is not a good fit.
This is wrong. The rate of integer and logical instructions that can be run per clock is equal to or higher than floating point instructions. For example, AMD GPUs 5-way VLIW units can execute, per clock: 5 integer/logical op, or 5 single precision flop, or 1 double precision flop. Nvidia GT200 GPU streaming processors can execute, per (shader) clock: 1 integer/logical op, or 1 single precision flop, or 0.5 double precision flop.
http://cloudscaling.com/blog/cloud-computing/amazons-ec2-gen...
And he estimated an overly conservative 10-20% annual growth. But given the EC2 buzz this year, and personal anecdotes from my friends and colleagues using it, my gut feeling tells me the 2010 revenues will have increased 50-100% over 2009 revenues.
Is EC2 profitable to Amazon? Likely very profitable if you want my opinion. It is well accepted in the industry that the dominant cost in large scale datacenters is power and cooling --not hardware, not human resources-- and I keep running numbers in my head and the hourly prices of all instance types are well above power & cooling.
Just as an example, we know that this new GPU instance has two 95W Xeon X5570 and two 247W Tesla M2050; assuming (1) a max TDP of 50W for the motherboard and rest of the server, (2) instances are run under full load 100% of the time and always reach these max TDP numbers (unlikely, but follow me for the sake of the argument), (3) Amazon uses servers with 80PLUS power supplies (80% efficient or more), (4) a rather good datacenter with a PUE of 1.3 (power usage effectiveness, which includes overhead from power distribution and cooling; numbers in the range of 1.2-1.4 are often quoted by James Hamilton from the AWS team: http://perspectives.mvdirona.com/), and (5) electricity costs of $0.10/kWh (average in the US, but I know Amazon datacenters are in locations with cheaper electricity), then the hourly power and cooling costs would be:
Amazon charges 17x this amount for on-demand instances ($2.10/hr), and 6x this amount for reserved instances ($0.74/hr).Given these numbers, Amazon must recoup the initial deployment costs very, very quickly... Which is why I also think EC2 must be very profitable.