If you statically link your software with the kernel, presumably an optimizing compiler will be able to remove most of the kernel from the resulting binary.
For those, like me, who don't know what the acronym LTO is: it's Link-time optimisation, the stage during compilation where the "linker" has all of the object files which make up your program available to it, and is therefor able to optimise across the entire program at once (instead of the individual constituent source files which make up the program).
There's a long, detailed PDF about LTO produced by GCC which was useful in helping me to understand exactly how this works here: https://gcc.gnu.org/projects/lto/lto.pdf