Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder if it’s worth the effort to add functionality to take advantage of this in GCC, clang for just these CPUs?


I think the point is that most ordinary C-code should already conform to this access pattern. The side comment about aliasing is a great big hint in that direction.

(This article is grist to the mill of my "every CPU eventually evolves to become an interpreter for its dominant programming language" thesis.)


Yes, AMD probably added the optimization because it was a common pattern emitted by compilers.


I wonder if this causes a performance penalty on position independent code, which seems to be used a lot on non-Windows platforms [].

[] It always struck me as one of those propeller-head features that GCC & co. love but which the MS and Intel compilers avoid 'just to be on the safe side' :]


He mentions this doesn't work with RIP-relative addressing, but I don't see how it would cause a penalty otherwise.


re your thesis, can't wait for a WASM ISA on a commercial CPU and canvas instructions for its graphics coprocessor


you'll need to wait for WASM to become "the dominant language" on a commercial CPU. definitely not next year.


AFAIK there's existing microarchitectural optimizations in gcc and clang already, so it's reasonable to assume that AMD would add just such a feature.

But most people don't enable these features when building because they often make the resulting executable/library unusable on general CPUs. However, businesses/individuals with more specialized needs and better control over their targets often do capitalize on these.


this is the purpose of the gcc -mtune flag, which does have an actual purpose beyond being used in the utterly redundant -march=native -mtune=native.


Back in the days of compiling MySQL myself, it was worth doing. But at some point They hardware cluster was not identical, and so I scripted builds and each machine built its own.


    -march=arch -mtune=arch
has existed for decades, and the performance improvement is measurable. But mainstream distributions often cannot take advantage, since they need to support all CPUs, and optimization for one is often a deoptimization for another. It's also a reason why Gentoo exists.

Interestingly, Intel's Clear Linux - a optimization-oriented distribution - uses

    -march=westmere -mtune=haswell
https://docs.01.org/clearlinux/latest/guides/clear/performan...


Westmere is pretty old but significantly is the first generation to support virtual machines with little overhead. If clear Linux allowed something newer then it would rule itself out for a lot of existing machines (such as the 20+ I have running HPC jobs with surprising reliability)


Yeah, it's a reasonable tradeoff - be compatible with 1st Gen (basically all post-2010 modern x86_64 Intel processors), but try fine-tuning it a little bit for Haswell.


probably yes. It means that spills are cheaper.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: