Obviously compiling for the target architecture is best, but for most software (things like crypto libraries excluded) 95% of the benefit of AVX2 is going to come from things like vectorized memcpy/memcmp. Building glibc using ifuncs to provide optimized implementations of these routines gives most users most of the benefit of AVX2 (or whatever other ISA extension) while still distributing binaries that work on older CPU microarchitectures.
ifunc memcpy also makes short copies suck ass on those platforms, since the dispatch cost dominates regardless of the vectorization. It's an open question whether ifunc helps or harms the performance of general use cases.
By "open question" I meant that there is compelling research indicating that GNU memcpy/memcmp is counterproductive, but the general Linux-using public did not get the memo.
On the other hand, it also means that your distro can supply a microarchitecture-specific libc and every program automatically gets the memcpy improvements. (Well, except for the golang/rust people.)
Wasn't this the point of Gentoo, back in the day? It was more about instruction scheduling and register allocation differences, but your system would be built with everything optimized for your uarch.