Can anyone hazard a guess as to how Linus was able to measure the cost of a page fault and an iret so precisely? What tools and techniques might he have used?
I don't know what he used, but its probably based on the model specific registers for performance counters. See Chapter 18 of the Intel 64 and IA-32 Architectures Software Developer's Manual.