Is there any way for software, even in assembly, to make intelligent use of on-cpu ram caches? I thought they were completely abstracted by the hardware.
In addition, many CPUs provide prefetch instructions to make sure data is in cache before it's used, or cache-skipping loads and stores to prevent polluting cache with data that's only ever touched once.