With modern FPGAs, you have the CPU on the same chip, which means you can use ju...

rockdoe · on June 18, 2013

With modern FPGAs, you have the CPU on the same chip, which means you can use just one chip and then it's OK for cost-sensitive devices - and your problem of moving data across chips is gone as well.

The speed of the typical CPU on board an FPGA is ludicrously slow compared to the amount of data the FPGA itself can process.

BTW it's true for GPUs just as well: just integrate them on the same chip, and forget about the PCI slots. Every high-end to mid-range cell phone has done it a long time ago.

They pretty much have the same problem, exactly as OP stated. Nobody's using mobile phone GPUs for algorithmic acceleration. Hell enough of them have huge weak spots in doing texture upload/download themselves.

wcunning · on June 18, 2013

They're not all that much slower. Take a look at http://zedboard.org/. That's a Cortex A9 dual core proc attached to a hell of a lot of FPGA fabric. In general, I would expect that the speed of data "out" of an FPGA is going to be slower than the speed of data "in" to an FPGA, as well as being somewhat less data "out." I would expect to pull in a lot of ADC data and then filter and process and etc., only sending on what I absolutely want to keep.

dllthomas · on June 19, 2013

Sort of a microscale map-reduce.

yosefk · on June 18, 2013

I'm not saying that you can or should do this or that right now with a given piece of hardware, just what is and is not possible. The CPU on the FPGA's die could be fast, embedded GPUs could be better at algorithmic acceleration (here - wait for a few years, they have good stuff in their pipeline), OpenCL drivers could be easily available (of course portability would still be a real problem) and people could then program embedded GPUs more easily, etc. I've personally been developing accelerators integrated onto the same die as a CPU for a long time and it works nicely enough.