You'd be surprised how many people stop looking at the perf numbers once all the tall tent poles are gone. They have no idea or tools for digging beyond that. I've worked on several projects where the customer still wanted the app to be 2-10 times faster (or worse, a competitor was 2-3 times faster) and people look at the charts and announce that they've done everything that can be done. Some hide in plain sight but others are just putting in work. For instance code duplication can hide a tall tent pole because it's chopped into five pieces.
I had a shared library running on a bunch of systems. One was VxWorks. Cross compiled. I knew nothing about VxWorks except that the code needed to run about 8x faster. I got about 2.5x out of my code by improving locality of reference problems that appeared to be intrinsic but just required experience to see. For the rest I filed a bug report to WindRiver and five to the cross compiler writer. At the end of this process I knew more about the VxWorks administration than anybody else in the group.
People give up, even when there's a clear business case for continuing.
I had a shared library running on a bunch of systems. One was VxWorks. Cross compiled. I knew nothing about VxWorks except that the code needed to run about 8x faster. I got about 2.5x out of my code by improving locality of reference problems that appeared to be intrinsic but just required experience to see. For the rest I filed a bug report to WindRiver and five to the cross compiler writer. At the end of this process I knew more about the VxWorks administration than anybody else in the group.
People give up, even when there's a clear business case for continuing.