Interestingly, gcc-8 does some really weird stuff. Its version of Multiply is 1.4 times slower than its version of Shift, but according to the disassembly, the Shift implementation is actually using an imul, whereas the Multiply implementation is doing a lea/shl.