I think the point is that most ordinary C-code should already conform to this access pattern. The side comment about aliasing is a great big hint in that direction.
(This article is grist to the mill of my "every CPU eventually evolves to become an interpreter for its dominant programming language" thesis.)
I wonder if this causes a performance penalty on position independent code, which seems to be used a lot on non-Windows platforms [].
[] It always struck me as one of those propeller-head features that GCC & co. love but which the MS and Intel compilers avoid 'just to be on the safe side' :]
AFAIK there's existing microarchitectural optimizations in gcc and clang already, so it's reasonable to assume that AMD would add just such a feature.
But most people don't enable these features when building because they often make the resulting executable/library unusable on general CPUs. However, businesses/individuals with more specialized needs and better control over their targets often do capitalize on these.
Back in the days of compiling MySQL myself, it was worth doing. But at some point They hardware cluster was not identical, and so I scripted builds and each machine built its own.
has existed for decades, and the performance improvement is measurable. But mainstream distributions often cannot take advantage, since they need to support all CPUs, and optimization for one is often a deoptimization for another. It's also a reason why Gentoo exists.
Interestingly, Intel's Clear Linux - a optimization-oriented distribution - uses
Westmere is pretty old but significantly is the first generation to support virtual machines with little overhead. If clear Linux allowed something newer then it would rule itself out for a lot of existing machines (such as the 20+ I have running HPC jobs with surprising reliability)
Yeah, it's a reasonable tradeoff - be compatible with 1st Gen (basically all post-2010 modern x86_64 Intel processors), but try fine-tuning it a little bit for Haswell.