It's not overhead, it's about dependency breaking. 32-bit xors on a single register are universally recognized as a zeroing idiom, which means the CPU doesn't have to wait for the results of previous operations in order to set the value of the applicable register to zero.
In modern CPUs zero'ing idioms aren't even executed, they only get as far as the register allocater. The register allocater will allocate a zero'd physical register for the architectural register that had the idiom applied to it and the job is done.
It's larger, because it needs to fit a 32bit value of 0 in the instruction, and thus e.g. on x86 needs 5 bytes, whereas xor reg,reg needs 2. As such it was a common code size optimization, which in turn has lead to CPU manufacturers optimizing their CPUs to recognize it and treat it even more efficiently.
IIRC immediate operands have to be the same size as the destination, so to zero a 32-bit register you need a 32-bit constant, and it simply makes the whole instruction larger than a simple xor.