You still won't be safe against several/most of the causes outlined in the page ...

rectang · on May 16, 2020

Paul,

Thank you very much — for your ongoing work in Open Source audio, for being willing to engage at length in this thread, and for being straightforward about what the system can deliver.

> If you really want 10msec finger->ear (a goal that I would say is reasonable, though given many live performer's and mixing engineer's normal physical setups, probably excessive)

I worked in a recording studio for 6 years, including two years as a mastering engineer. Most of the sonic adjustments I would make during mastering and mixing fell below the threshold of perception — but when added together they would produce something well above the threshold of perception.

There's nothing magic about this. I don't have "golden ears" (although because I've trained I can more quickly identify certain patterns than people who haven't trained).

The point is simply that an aggregation of imperceptible changes can sum to a perceptible result. It's akin to why you perform intermediate processing in both video and audio at a higher resolution than the final delivery medium: otherwise an accumulation of small, possibly imperceptable degradations will cause perceivable degradation of the finished product.

And so, I dispute the idea that just because there are other sources of latency, we should resign ourselves and accept substantial contributors to latency which fall below perceptual threshold. The only number that matters is the final sum of all latencies.

> You cannot guarantee the required scheduling on general purpose (intel/amd) motherboards unless you take great care with the selection, and even then ... I've heard that financial services/investment firms are the main reason you can still buy mobos without SMIs, because their "latency" requirements would be broken by these interrupts.

With this in mind, I will set aside one possibity I'd considered: writing for general purpose CPUs (e.g. multicore x86_64) outside of mainstream operating systems.

Instead, while I'll continue prototyping the project on mainstream operating systems, I'll probably look more deeply into dedicated outboard DSP boards.

> *On the other hand, the "not-guaranteed" case with a reasonable mobo, sensible bus connected devices, an RT kernel and properly written user space software is going to work almost all of the time. Just no guarantees.

I appreciate how hard you've worked to achieve that.

My question, then, is how can I be confident that I'm actually meeting these "almost-all-of-time-time" latency requirements?

In my experience, most systems recommended that you lower the latency until you hear clicks and pops. That convention leaves me... dissatisfied. A dropout is a detectable event, and the monitoring system should surface it.

Just as boggling when you have exacting standards is when the system falls back and delivers something subtly degraded without telling you, like changing the latency without warning because the system would otherwise go down. I understand why systems are designed to prefer degradation over failure, but for my purposes I need to know when it happens. Expecting me to monitor continuously for an effect which is at the threshold of perception, such as subtly increased latency, is draining — and ultimately unreasonable.

We have meters for noise floors and red lights indicating that clipping occurred. What facilities exist to help me understand when the latency behaviors of my rig are not meeting my requirements?

PaulDavisThe1st · on May 16, 2020

By "probably excessive" I wasn't referring to psycho-acoustics. I merely meant that given your willingess to include speaker->ear latency, many people work on music in scenarios where that measure alone is already close to or above 10msec. The worst case scenario is likely a pipe organ player, who deals with latencies measured in units of seconds. Humans can deal with this without much difficulty - it is jitter that makes it hard (or impossible) to perform, not latency. Long-standing drummer & bass player duos generally report being able to deal with about 10 msec when performing live.

On the flip side, you have people arguing convincingly the comb filtering caused by phased reflections inside almost every listening scenario are responsible for the overwhelming majority of what people as "different". Move your head 1ft ... lose entire frequency bands ... move it again, get them all back and them some!

Regarding latency deadlines: well, the device driver can tell you (and does, if you ask it). If you use JACK, it will callback into your client every time there is an xrun reported by the audio hardware driver. This in turn has a quite simple definition: user space has not advanced the relevant buffer pointer before the next interrupt. There are circumstances where this actually isn't a problem (because the data has already been handled), but it is a fairly solid way of knowing whether the software is keeping up with the hardware. Something using ALSA directly can determine this in the same way that JACK does.

For audio, there is no other measurement of this that really matters. Using some sort of system clock to try to check timing, while likely to be kinda-sorta accurate enough, ignores the fact that the only clock that matters is the sample clock. If you're operating with huge margins of safety, some other clock measurimg time is good enough, but as you begin to inch closer to problem territory, it really isn't. For reference, we generally find that when "CPU loads" (variously measured) get close to 80% on macOS and Linux, scheduling deadlines start failing.

Nothing on linux will automatically "fallback" to less demanding latency requirements. If the system can't meet the requirements of the audio interface, it will continue to fail. This is actually part of the reason why Ardour tends not to deactivate plugins - the user can expect the DSP/CPU load to be more or less constant no matter what they do, rather than being low and then climbing through a threshold that causes problems as they do stuff.