I'm not disagreeing with you. I acknowledge that there may be a market for CPU-o...

freeone3000 · on Dec 16, 2020

GPU underutilization depends on what, exactly, the model you're training is. It's not unreasonable to hit 80% or more of CUDA core usage on non-recurrent models like convnets, given sufficiently fast data pipelines and a reasonable batch size. Transformers and other recurrent functions hit 100% CUDA core utilization for large portions of each epoch, with the low-% usage on the comparatively short weight update at the end. As well, the current rule of thumb is that at the same price point (so a Xeon 4114 and a Nvidia Titan RTX) the GPU completes each epoch in 10% of the time as the CPU given the same compute graph... So it's highly unlikely that training will be anywhere close to as fast on a CPU as it is on a GPU.