That quote is referring to the A100... the H100 used ~75% more power to deliver ...

That quote is referring to the A100... the H100 used ~75% more power to deliver "up to 9x faster AI training and up to 30x faster AI inference speedups on large language models compared to the prior generation A100."[0]

Which sure makes the H100 sound both faster and more efficient (per unit of compute) than the TPU v4, given what was in your quote. I don't think your quote does anything to support the position that TPUs are noticeably better than Nvidia's offerings for this task.

Complicating this is that the TPU v5 generation has already come out, and the Nvidia B100 generation is imminent within a couple of months. (So, no, a comparison of TPUv5 to H100 isn't for a future paper... that future paper should be comparing TPUv5 to B100, not H100.)

[0]: https://developer.nvidia.com/blog/nvidia-hopper-architecture...