The question is "what" is listening. For it to be responsive it's probably hardware-on-device that's doing the keyword processing. It would be simple to check though - look at network traffic.
It is always listening just like your dog is always listening. If you are not talking to it and you are not saying its name you are being (mostly) ignored.
Most of the time, it is not paying attention - the chip that is processing the sound is looking for the ONE word that will activate it. That passive audio processing is happening locally on a chip that is dedicated to the task. Once activated - the expensive processing happens and the sound gets processed, converted to text, sent to the cloud.
I understand this, and I don't happen to agree with the people who feel this type of technology should not be embraced, but, to be fair, the chip is controlled by software that is constantly connected to the cloud and updating over the air. It would take very little to update the software to disable the on-chip keyword detection and just record everything. That update could easily be done without your knowledge and in a way that would be almost undetectable since the software stack doesn't appear to be open and the server-stack is in the cloud and out of your control.
It's much easier to listen for a single word than to do the rest of the voice-recognition tasks. It would be a huge waste to upload all of the audio all the time, so usually these systems do the one-word thing on the device. They have a rolling buffer of a few seconds so that when it detects that hotword, it can send that to the cloud. It helps with noise removal. But not everything.
That means that it is ALWAYS listening. It needs to listen for that trigger word at all times.