New EC2 Instance Type - The Cluster GPU Instance

mrb · on Nov 15, 2010

Amazon is expanding their EC2 feature set so rapidly... The pace is mind blowing to me. Last year, Randy Bias estimated EC2 was pulling $220M revenue/yr:

http://cloudscaling.com/blog/cloud-computing/amazons-ec2-gen...

And he estimated an overly conservative 10-20% annual growth. But given the EC2 buzz this year, and personal anecdotes from my friends and colleagues using it, my gut feeling tells me the 2010 revenues will have increased 50-100% over 2009 revenues.

Is EC2 profitable to Amazon? Likely very profitable if you want my opinion. It is well accepted in the industry that the dominant cost in large scale datacenters is power and cooling --not hardware, not human resources-- and I keep running numbers in my head and the hourly prices of all instance types are well above power & cooling.

Just as an example, we know that this new GPU instance has two 95W Xeon X5570 and two 247W Tesla M2050; assuming (1) a max TDP of 50W for the motherboard and rest of the server, (2) instances are run under full load 100% of the time and always reach these max TDP numbers (unlikely, but follow me for the sake of the argument), (3) Amazon uses servers with 80PLUS power supplies (80% efficient or more), (4) a rather good datacenter with a PUE of 1.3 (power usage effectiveness, which includes overhead from power distribution and cooling; numbers in the range of 1.2-1.4 are often quoted by James Hamilton from the AWS team: http://perspectives.mvdirona.com/), and (5) electricity costs of $0.10/kWh (average in the US, but I know Amazon datacenters are in locations with cheaper electricity), then the hourly power and cooling costs would be:

  (95*2 + 50 + 247*2) / 0.8 * 1.3 / 1000 * 0.10 = $0.119/hr

Amazon charges 17x this amount for on-demand instances ($2.10/hr), and 6x this amount for reserved instances ($0.74/hr).

Given these numbers, Amazon must recoup the initial deployment costs very, very quickly... Which is why I also think EC2 must be very profitable.

anarchitect · on Nov 15, 2010

At an Amazon Tech Summit I was at in London this month, it was mentioned that Jeff Bezos expects AWS to eventually exceed the profits they make from retail.

abthomson · on Nov 15, 2010

It's definitely not accepted that power and cooling are the dominant cost, in most cases it's the servers:

From James Hamilton (who just happens to work for Amazon) http://perspectives.mvdirona.com/2010/09/18/OverallDataCente...

mrb · on Nov 15, 2010

Strange. I read James' blog (I even linked to it). And I remember posts demonstrating the reverse, that the cost of servers was not dominant in large-scale data centers, but dominant only in medium- and small-scale data centers (think typical enterprise data centers). James was using this data to explain why it made sense for enterprises to benefit from large-scale cost savings by moving to EC2.

Anyway, this does not change my point that the EC2 hourly prices are so high that they eclipse both power & cooling as well as server costs for Amazon.

samratjp · on Nov 15, 2010

Yup, they're supposed to make twice as much this year - http://venturebeat.com/2010/08/03/amazon-web-services-genera...

buss · on Nov 15, 2010

> Is EC2 profitable to Amazon? Likely very profitable if you want my opinion.

As a rule of thumb, if Amazon is doing something then it is profitable. Amazon will very quickly stop doing something if it can't make money.

alnayyir · on Nov 15, 2010

>Amazon will very quickly stop doing something if it can't make money.

New to the whole Bay Area thing eh? Plenty of Zombie Startups out there that aren't getting traction/pivoting.

ANH · on Nov 15, 2010

It's even more impressive when you know Amazon started AWS to take advantage of excess capacity they had during the off-season when they weren't dealing with the crush of holiday orders.

nivertech · on Nov 15, 2010

This is just urban myth, according to Werner Vogels.

ANH · on Nov 15, 2010

Well, the Amazon Senior VP of International Retail said this was the case in his talk for the Stanford Entrepreneurial Thought Leadership series. So, seems there's some dispute within the company. But I suppose I should take Werner's word for it since he's the CTO.

randybias · on Nov 18, 2010

It was premeditated and planned using all new capacity. I chatted with Chris Pinkham, who architected and was VP of Engineering for EC2. I also verified with Chris Brown, who was development lead for EC2 and Ben Black who was leading networking.

randybias · on Nov 18, 2010

Yep. I was wrong. It looks like AWS, Rackspace Cloud, and probably even GoGrid are all around 100% y-o-y growth.

AngryParsley · on Nov 15, 2010

This is cool, but you know what would be even cooler? Instances with SSD storage. It's so annoying to have database queries run an order of magnitude faster on my MacBook Air than on a cloud server.

I don't know of any major provider that offers SSD instances. It really is an untapped market.

PStamatiou · on Nov 15, 2010

It's coming.. http://gigaom.com/2010/06/23/jungledisk-founder-launches-new...

stavros · on Nov 15, 2010

I don't know if they use SSDs, but Linode's London servers are absolutely screaming fast.

Linode: Timing buffered disk reads: 230 MB in 3.01 seconds = 76.44 MB/sec

My local X-25M: Timing buffered disk reads: 246 MB in 3.02 seconds = 81.48 MB/sec

This would suggest they are using SSDs (or fast RAID? I'm not sure if this benchmark is sequential access). I get comparable performance for the Georgia data center, but it varies greatly there, so I'm not sure what's going on.

EDIT: Sorry, apparently I can't tell my sde from my sda. My actual SSD performance is:

Timing buffered disk reads: 564 MB in 3.01 seconds = 187.47 MB/sec

So yes, disregard everything above.

codedivine · on Nov 15, 2010

76MB/s is quite common for regular hard drives. The great advantage of SSDs is not necessarily sequential speed but rather random IO.

As for your X25-M, do you have TRIM enabled? Your read speeds are consistent with a used-state SSD without TRIM.

stavros · on Nov 15, 2010

I am using TRIM now (I assume), as I'm running the Ubuntu 10.10 kernel, which supports it on ext4. Before this, I was running 10.04, which didn't support trim. I'm not sure if I should clone the drive and reformat to get more speed, and I'm also not sure if there are alignment issues (I followed Ted Ts'o's advice on setting that up)...

codedivine · on Nov 15, 2010

Sorry if you have already done all this, but do you know if you are using the G1 or G2 X25-M? G1 doesn't have trim. G2 supports TRIM as long as you are not using the old firmware. So might be worth checking your firmware etc.?

Btw here is hdparm reading for my Western Digital Caviar Black (regular hard disk) just to give you an idea: 318 MB in 3.01 seconds = 105.56 MB/sec

stavros · on Nov 15, 2010

I did upgrade my firmware a month ago, but I'm not sure which version I have, I'll check. However, now that you mention it, I do remember the disk doing 200ish mb/sec, maybe this was a fluke. It's almost empty, I use 20 out of the 160 gb, so I don't think it's a matter of allocated space...

bdonlan · on Nov 15, 2010

Note that the performance benefits for TRIM only affect write performance, _not_ read performance.

bdonlan · on Nov 15, 2010

Linode doesn't use SSDs (yet) - the founder has been spotted on their IRC channel in the relatively recent past mentioning the cost/performance isn't where they want it yet.

hdparm only does sequential reads, so don't expect a night-and-day difference between SSDs and rotating disks. Plus that benchmark reads out of the disk (buffer) cache, so you might really testing your memory speed.

rbanffy · on Nov 15, 2010

> I don't know if they use SSDs, but Linode's London servers are absolutely screaming fast.

I think it's more likely they are using a high performance shared storage box. Those are really fast and reliable. The only reason I would use disks (whether SSDs or spinning metal) would be for speed. If, say, a storage box has 200 servers hanging from it and all 200 decide to go nuts with disk access you won't be able to sustain anything near 76 MBps per box.

As for the storage boxes themselves, they usually employ piles of ECC RAM, specialized network and disk controllers, SSDs, fast and small disks, larger and slower disks and keep moving data around trying to guess the optimal positioning to give you the best possible performance under your varying workload. It must be really cool to design one.

Nate75Sanders · on Nov 15, 2010

The 1TB SATA non-SSD drive in my server at work clocks in at 105.33MB/s for "hdparm -t"

I doubt they're using SSDs.

stavros · on Nov 15, 2010

Ah, it's probably a sequential benchmark then. For what it's worth, the London linode feels faster than my desktop (which has an ssd).

JshWright · on Nov 15, 2010

Linode uses local storage on each host. Fast conventional drives in a RAID-10 array.

mike-cardwell · on Nov 15, 2010

FWIW, my Linode is in the NJ DC:

Timing buffered disk reads: 142 MB in 3.00 seconds = 47.28 MB/sec

rmc · on Nov 15, 2010

Get an instance with a lot of memory and the use a tmpfs?

prodigal_erik · on Nov 15, 2010

They would have to track wear and charge for the depreciation of a local SSD, since every newly started instance is going to want to overwrite much of the contents. It might make more sense to have SSD space available via SAN to any of your instances.

flyt · on Nov 15, 2010

Give it time. SSD's are entirely too expensive today for this kind of deployment.

mrb · on Nov 15, 2010

SSDs are cheap in terms of $/IOPS. In fact they are about a hundred times cheaper than HDDs (Crucial C300 128GB: $250 for 50k IOPS vs. entry-level 7200RPM HDD: $60 for 120 IOPS or so).

People, stop assuming that $/GB is the only metric that matters.

wwortiz · on Nov 15, 2010

I wonder if $/GB matters when people are selling VMs (like say linode), I don't know how many Virtual Machines are put into one server but if you are doing 16gb per VM and depending on the RAID usage as well it might start to add up.

I also don't know if this is still a problem but there was a worry about limited writes on SSDs and if normal harddrive lifetimes are greater than that of an SSD by a significant amount I could see why it would be a problem.

haribilalic · on Nov 15, 2010

Why can't they offer SSD instances at a higher price than non-SSD instances? Those that want or need the extra performance can pay for it just as they're paying for other niche instances already.

petercooper · on Nov 15, 2010

It's not exactly an apples for apples comparison, but with your 8 instances rocking 2 * 515Gflops of GPU each, you get just over 8Tflops. Looking back at the TOP500 lists, this "peak" value would have got you into the top 15 supercomputers in 2003. (Looking back further, you'd be vying for a top 5 spot in mid 2002..)

The more depressing observation is that 33.5ECUs are equivalent to 8 cores @ 2.93GHz on Intel's recent architecture. This means your typical "small" EC2 instance with 1 ECU is on a par with ~700MHz of a single modern Intel core. (Highly unscientific but an interesting ballpark.)

haribilalic · on Nov 15, 2010

This means your typical "small" EC2 instance with 1 ECU is on a par with ~700MHz of a single modern Intel core.

Amazon says that "one EC2 Compute Unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor"[1].

[1] http://aws.amazon.com/ec2/

chesser · on Nov 15, 2010

GP:

> > The more depressing observation is that 33.5ECUs are equivalent to 8 cores @ 2.93GHz

You said:

> I don't know where you got 700 MHz from, because by my math, it'd be equivalent to a 1.43 GHz Nehalem core (33.5 / 8 * 2.93)

Transposed.

>>> 8*2.93/33.5

0.69970149253731351

8 cores at 2.93 GHz is 23.44 GHz, which means one compute unit is 700 MHz.

A modern processor can do more per clock than an older processor. In addition, it has a larger cache, faster cache, and a faster memory bus, although on the flip side the memory bus is being shared between more CPUs.

eliben · on Nov 15, 2010

Books from 20 years ago already advised to stop using FLOPs as an estimate of computational power. And for a good reason. I know you admitted it's not a fair comparison, but statements like this help marketing departments to do their job convincing the masses of "facts" that are highly unscientific.

petercooper · on Nov 16, 2010

It is useful to have some sort of measuring stick, however, and my comparison was more a cute attempt to show how this new offering compares with recent supercomputers than to seriously judge its capacity. Sadly Amazon gave us no other specs than FLOPs to go on..

matclayton · on Nov 15, 2010

Looks like the Tidepowerd beta came out with perfect timing, must give this a spin http://www.tidepowerd.com/ compile .net code to gpu.

P.s. Disclaimer, they are friends of mine, the beta is pretty epic!

perssontm · on Nov 15, 2010

Amazon are really pushing the boundaries in the vm area, and also making it easily available and quite affordable as well.

It seems like they will never turn evil, but most big companies do, or perhaps they are just hiding it very well. ;)

stavros · on Nov 15, 2010

It's easier to keep honest when your business model is "provide great service for a good price".

tomjen3 · on Nov 15, 2010

Man thats a beast of a machine - each of the CPUs have 8b of cache, not to mention that you get a terra flop of double precision.

Now if only I had some use for this :( (inspiration welcome, I am writing about GPU programming right now).

shogunmike · on Nov 15, 2010

The sky is your limit for inspiration. Check out the applications section of the Wikipedia GPGPU page: http://en.wikipedia.org/wiki/GPGPU#Applications

mohawk · on Nov 15, 2010

Molecular dynamics, quantum chemistry?

samratjp · on Nov 15, 2010

If you really want to donate, there's always folding@home. Of course, a render farm can do the trick too if want purrty ray tracing? Look into Blender.

ntoshev · on Nov 15, 2010

Machine learning? Implement pagerank or SVMs on CUDA :)

What is the mapreduce of CUDA going to be?

bartman · on Nov 15, 2010

These machines hold their promises, they are extremely fast, the networking between the instances in fast, as is their connection to the outside world. I'm eager to see what people do with these.

And the pricing... to quote from the other article[1] on the GPU instances that's on the front page right now:

"An 8 TeraFLOPS HPC cluster of GPU-enabled nodes will now only cost you about $17 per hour."

[1] http://www.allthingsdistributed.com/2010/11/cluster_gpu_inst...

bigiain · on Nov 15, 2010

"I'm eager to see what people do with these."

Is anybody else thinking "custom built rainbow tables for any algorithm you like, delivered directly into your S3 bucket in under 1 hour for $25"?

(I wonder how much this means I should up the bcrypt workfactor to keep my password hashes secure from typical website crackers?)

binomial · on Nov 15, 2010

Well, rainbow tables are defeated with salt. I wouldn't worry about that if you're salting. And you are salting, right?

ithkuil · on Nov 15, 2010

does anybody know GPU instances can be of any aid for building full text indices (inverted lists) or other non-floating point workfloads ? I was skimming through the title of a recent paper presenting an sorting algorithm exploiting GPUs, but still I'm in the mental model of treating GPU workloads as having to do with floating point operations.

wmf · on Nov 15, 2010

There's a little work in this area, but it looks like people have just scratched the surface. http://news.ycombinator.com/item?id=1149800

mzl · on Nov 15, 2010

I'm not a GPGPU expert, but one thing that is easy to forget is that floating point types have a well-functioning integer subset. For example, on 32-bit computers it can be beneficial to use 64-bit doubles for extended precision in integer calculations. That said, when I've looked into potentially using GPGPU, the problem has been that branchy code is not a good fit.

jra101 · on Nov 15, 2010

Modern GPUs do have a full set of integer instructions, they just don't run as fast as floating point instructions.

Depending on how well a problem maps to the massively parallel architecture of a GPU, this may not matter.

mrb · on Nov 15, 2010

This is wrong. The rate of integer and logical instructions that can be run per clock is equal to or higher than floating point instructions. For example, AMD GPUs 5-way VLIW units can execute, per clock: 5 integer/logical op, or 5 single precision flop, or 1 double precision flop. Nvidia GT200 GPU streaming processors can execute, per (shader) clock: 1 integer/logical op, or 1 single precision flop, or 0.5 double precision flop.

timf · on Nov 15, 2010

Check out the extensive benchmarks here:

http://blog.cyclecomputing.com/2010/11/a-couple-more-nails-i...

(HN: http://news.ycombinator.com/item?id=1906452 )

athst · on Nov 15, 2010

Wow! Amazon is absolutely killing it. I wonder if this will be the inflection point when GPU computing really takes off.

tszming · on Nov 15, 2010

Seems this is dedicated server technology, not virtualization, except we can boot up the server using API?