Seriously. A lot of these proposals go veering off into second-order considerations ("Easier to decode!" "A few picojoules less energy") as I'd be very surprised if the bottlenecks are going to be from SIMD vs vector architecture ISA issues - as compared to, say, memory bandwidth or multiply-add bandwidth.
A few years ago I tried to buy a liquid cooled overclocked server for trading. Enabling AVX cost extra due to the concentrated heat output from the MMU and each core's vector unit.
It was along the lines of being able to get a server that was tested stable at 5GHz without AVX vs 4.5 GHz with AVX for the same price.
So at least on Intel, these vector units are apparently limiting clock speeds and yields due to power consumption.