Figured I might as well mention it in case AMD was doing something totally wild, but I would agree that using the register file sounds likely :) I wonder how they manage the pressure on it between renaming and memory mirroring…
In principle if you are using the GP PRF it is possible to implement it so that there is little additional pressure: the store instruction already has it's data input a register which has been renamed: now you just need to organize it so that the subsequent load that targets the same location is a no-go: simply alias the arch reg targeted by the load onto the existing register used by the store (much like mov-elimination)
So except for a small window in the middle you have the same pressure on the PRF.
I don't know if that is how it is actually implemented, of course!