In some recent work from my group [1], we reduce the complexity of keeping up wi...

jabl · on Feb 20, 2019

Recently a patch was contributed to gcc that converts mmx intrinsics to sse. Also the gcc power target supports x86 vector intrinsics, converting them to the power equivalents.

It's not as ambitious as your approach though, more like a 1:1 translation and thus cannot take advantage of wider vectors.

glangdale · on Feb 20, 2019

That patch primarily is there to avoid the pitfalls of MMX on modern architectures; it is gradually becoming deprecated. On SKX, operations that are available on both ports 0 and 1 for SSE or AVX are only available on port 0 for MMX. So code that uses MMX is getting half the throughput (which may or may not matter, but still).

jabl · on Feb 21, 2019

Thanks for the explanation, I wasn't aware of the reasoning behind it. I would guess by now all actively maintained performance-critical code has been rewritten in something more modern, so it certainly makes sense for Intel to minimize the number of gates they dedicate to MMX.

wmu · on Feb 20, 2019

Sorry for a non-constructive comment, just wanted to say your paper is great. :)

ajayjain · on Feb 24, 2019

Thank you! :)