Music performance is a synchronized effort, and with very precise timings. If there is lag, everything is just a bit late and a musician can (unconsciously) compensate up to some point. But for example (if my math is correct) 120 bpm, has 8th notes every 250ms.
If drum beat is sometimes, and only sometimes, 100ms later than it should, the result is not nice at all. And it is random jitter.
There’s also MIDI timecode, which has to send at four messages per frame (up to 120 messages per second), and whose entire job is to convey timing. It’s basically unusable over WiFi.
As an amateur musician, I played with real time synthesis a lot. Latency, and especially jitter in latency is the biggest enemy. 100 ms is an eternity. 5-7 ms is still noticible and anything above 10 ms becomes a nuisance. Some artists, especially drummers, hear 2 ms differences.
And it’s quite logical really. A reasonably fast tempo is 180 bpm. Playing sixteenths notes would separate them by 80 millis. Then you have separation in swing style sixteenths, funk (which is often ahead of the pulse by a tiny fraction) and the real scale is around 20 ms. That’s comparable to 50 frames per second.
This is also the reason (among others) why orchestras need conductors. The right side would hear the left side 100 millis later due to the speed of sound.
I am of the belief you need to be in the sub-millisecond range for jitter, in order to be capturing a performance authentically.
Whatever the range of perception is, there’s going to be another range where people can’t consciously describe what they are perceiving, yet it affects them.
If you are playing an instrument, eg a MIDI keyboard, I’ve found anything above around 10ms creates a noticeable lag between when you physically touch the key and when you hear the sound.
If drum beat is sometimes, and only sometimes, 100ms later than it should, the result is not nice at all. And it is random jitter.