There are plenty of architectures that do exactly this. EMR-on-S3, Google Datapr...

zbjornson · on Jan 6, 2016

I'm still trying to parse the docs and Manta source code to see what it actually does, but it seems unique if the data storage nodes are also the data processing nodes and no data transfer happens from some storage service before the job begins. The other key factor is having neither startup time nor the cost of a perpetually running cluster. Per my comment below [1], we have used Lambda with S3 to get something like this, as well as our own architecture built on plain EC2/GCE nodes.

[1] https://news.ycombinator.com/item?id=10846514

qaq · on Jan 6, 2016

Not only that but the thing is built by guys who really know what they are doing like Bryan Cantrill and other former SUN top people.

vgt · on Jan 6, 2016

got it. thanks!

justinsaccount · on Jan 6, 2016

As you sure you understand what "take the processing to the data" means?

EMR-on-S3 is the "copy the data to the processing nodes" variety.