Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can a unikernel be used to create a sound/music application which is not afflicted by the latency problems which bedevil such applications when running inside traditional operating systems?

For example, could it be used for a softsynth which takes in midi input and emits digital audio with the OS only contributing guaranteed sub-millisecond latency?



is that not what the linux realtime patches do? https://wiki.archlinux.org/index.php/Realtime_kernel_patchse...


I've been researching this for a long time, and it's unclear to me whether Linux with PREEMPT_RT patches would meet that sub-ms requirement. I see numbers all over the place from various sources, from sub-millisecond to more than 10 ms.

10 milliseconds is often considered the threshold of human perception with regards to audio latency. (It's possible that it's lower under some circumstances, but 10 ms is a good enough approximation for my purposes.) My goal is to create a sound/music tool which runs lots of DSP and which has a total system latency of under 10 milliseconds, including OS, application (including intrinsic latency of DSP procedures), hardware, and sound in air (about 1 ms per 3 meters).

When I read about latency, I often see "we're at 6-9 ms, that's good enough because it's not perceptible". Unfortunately, that's not good enough if there are several components which contribute to total system latency and they are all pushing 10 ms. Hence, my sub-ms requirement for the OS.

Committing to a platform will be a costly choice. I don't want to invest in writing for realtime Linux only to find that I really need to run a hardcore RTOS, or run dedicated DSP chips, etc.

That's why I'm interested in whether running a unikernel can offer stronger guarantees.


I write realtime audio software for Linux, and have done so for 20 years.

While your goal is admirable - it certainly is possible to come up with scenarios where "sub-ms" latency is desirable - it's really not relevant.

Your stated goal ("...lots of DSP ... under 10 msec") is already entirely achievable on Linux, assuming you're close enough to your speakers (or wearing headphones).

But sub-msec can only make sense here if it describes scheduler latency, since there's no audio hardware that can function in the sub-msec range. The linux scheduler is way, way below that threshold and SCHED_FIFO threads will see that performance barring hardware issues (c.f. https://manual.ardour.org/setting-up-your-system/the-right-c...)

Finally .... writing audio software is a lot of fun. But please don't just jump in without seeing if you can instead contribute to an existing project first. The field is littered with the dead and half-dead corpses representing the discarded work of developers who thought it would be fun, and then moved on.


I'm curious what you think of PipeWire:

https://gitlab.freedesktop.org/pipewire/pipewire/-/wikis/Per...

It claims to get near Jack latency, while implementing a more generic audio graph with better security than PulseAudio.

In regards to Linux scheduler performance, do you have any experience with SCHED_DEADLINE?


We don't use SCHED_DEADLINE with audio, because things are driven by hardware interrupts/requirements, which makes SCHED_FIFO more appropriate ("wakeup! time do the work! right now! until it's done!"). macOS doesn't really have SCHED_FIFO, so we end up forced to use something to deadline scheduling there. It works fine.

I commented a lot on PipeWire when it started. Thankfully Robin Gareus and others managed to get its developer to change course somewhere along the journey to use a better basic model (notably, pull not push). I no longer follow its development. If it works, it will be good for Linux audio. I don't know if it will, or won't.

[for anyone who doesn't know, I'm the original author of JACK]


Hey, thanks for all your work.


As for contributing to existing projects, I hear you — I've been highly active in Open Source for well over a decade. Conceptually, the closest thing to what I want to do is Pure Data — and I have in fact contributed to it.

But I'm also extremely motivated and willing to go all the way down and write the entire thing from scratch if I have to. Or to learn enough about every last step in the chain that I can actually control for latency — and it may be that it takes about the same amount of work. Finding numbers I can trust on latency seems hopeless. Everybody fudges rather than fails.


The word "latency" covers many things, many subtly different from each other. One of its meanings, unrelated to the one you're talking about, is the delay in signal flow caused by algorithms (many digital filters, for example). Often called "plugin delay compensation", or more generically "latency compensation".

Let me just point out that it has taken 20 years and a guy whose PhD thesis was about latency compensation in a DAW to finally "fix" this sort of "latency" in Ardour. This is a massively harder problem from a design perspective than the scheduling latency issues you've referred to.

[ EDIT: to be fair, the actual correct solution to latency compensation didn't take 20 years to implement, more like a year or so when taking place within the context of a large existing code base. ]


The complexity of solving such problems generally is actually what drives me to start from scratch: rather than solve an intractable general problem, instead limit the scope of where the project must run and what it must do.

My perception is that I am not capable of reliably tuning a general purpose OS for a total system latency of under 10 msec. I can't give the total system a hard number and believe that it will obey; instead, I need to perform a lot of esoteric tweakery of subsystems I probably don't understand. The system won't alert me reliably when it fails, but will instead either drop out or just give me more latency than I asked for — and there are innumerable factors outside my control that could cause it to fail.

However, the composition tool I want to create has a pretty small set of requirements, if I accept that it only need serve my particular use case. So how about I ensure that my app is the only thing running on the hardware, via unikernel, or RTOS, or even bare metal?

Implementing all of my compositional requirements is probably easier and certainly more rewarding than tweaking latency parameters without having confidence that my results will be enduring or predictable.

If it turns out that the knowledge I gain during that exercise allows me to control latency well enough and I can return to mainline operating systems and contribute to existing projects, all the better. I don't really want to go down this path, but I'm not willing to accept a tool that maybe-kinda-sorta-sometimes meets my absolute requirements.


You still won't be safe against several/most of the causes outlined in the page from the Ardour manual that I linked to.

SMIs, random hardware that locks the bus for too long ... out of the kernel's control.

If you really want 10msec finger->ear (a goal that I would say is reasonable, though given many live performer's and mixing engineer's normal physical setups, probably excessive) and you want to guarantee it, it's not just the OS but the hardware you'll have to change. You cannot guarantee the required scheduling on general purpose (intel/amd) motherboards unless you take great care with the selection, and even then ... I've heard that financial services/investment firms are the main reason you can still buy mobos without SMIs, because their "latency" requirements would be broken by these interrupts.

On the other hand, the "not-guaranteed" case with a reasonable mobo, sensible bus connected devices, an RT kernel and properly written user space software is going to work almost all of the time. Just no guarantees.


Paul,

Thank you very much — for your ongoing work in Open Source audio, for being willing to engage at length in this thread, and for being straightforward about what the system can deliver.

> If you really want 10msec finger->ear (a goal that I would say is reasonable, though given many live performer's and mixing engineer's normal physical setups, probably excessive)

I worked in a recording studio for 6 years, including two years as a mastering engineer. Most of the sonic adjustments I would make during mastering and mixing fell below the threshold of perception — but when added together they would produce something well above the threshold of perception.

There's nothing magic about this. I don't have "golden ears" (although because I've trained I can more quickly identify certain patterns than people who haven't trained).

The point is simply that an aggregation of imperceptible changes can sum to a perceptible result. It's akin to why you perform intermediate processing in both video and audio at a higher resolution than the final delivery medium: otherwise an accumulation of small, possibly imperceptable degradations will cause perceivable degradation of the finished product.

And so, I dispute the idea that just because there are other sources of latency, we should resign ourselves and accept substantial contributors to latency which fall below perceptual threshold. The only number that matters is the final sum of all latencies.

> You cannot guarantee the required scheduling on general purpose (intel/amd) motherboards unless you take great care with the selection, and even then ... I've heard that financial services/investment firms are the main reason you can still buy mobos without SMIs, because their "latency" requirements would be broken by these interrupts.

With this in mind, I will set aside one possibity I'd considered: writing for general purpose CPUs (e.g. multicore x86_64) outside of mainstream operating systems.

Instead, while I'll continue prototyping the project on mainstream operating systems, I'll probably look more deeply into dedicated outboard DSP boards.

> *On the other hand, the "not-guaranteed" case with a reasonable mobo, sensible bus connected devices, an RT kernel and properly written user space software is going to work almost all of the time. Just no guarantees.

I appreciate how hard you've worked to achieve that.

My question, then, is how can I be confident that I'm actually meeting these "almost-all-of-time-time" latency requirements?

In my experience, most systems recommended that you lower the latency until you hear clicks and pops. That convention leaves me... dissatisfied. A dropout is a detectable event, and the monitoring system should surface it.

Just as boggling when you have exacting standards is when the system falls back and delivers something subtly degraded without telling you, like changing the latency without warning because the system would otherwise go down. I understand why systems are designed to prefer degradation over failure, but for my purposes I need to know when it happens. Expecting me to monitor continuously for an effect which is at the threshold of perception, such as subtly increased latency, is draining — and ultimately unreasonable.

We have meters for noise floors and red lights indicating that clipping occurred. What facilities exist to help me understand when the latency behaviors of my rig are not meeting my requirements?


By "probably excessive" I wasn't referring to psycho-acoustics. I merely meant that given your willingess to include speaker->ear latency, many people work on music in scenarios where that measure alone is already close to or above 10msec. The worst case scenario is likely a pipe organ player, who deals with latencies measured in units of seconds. Humans can deal with this without much difficulty - it is jitter that makes it hard (or impossible) to perform, not latency. Long-standing drummer & bass player duos generally report being able to deal with about 10 msec when performing live.

On the flip side, you have people arguing convincingly the comb filtering caused by phased reflections inside almost every listening scenario are responsible for the overwhelming majority of what people as "different". Move your head 1ft ... lose entire frequency bands ... move it again, get them all back and them some!

Regarding latency deadlines: well, the device driver can tell you (and does, if you ask it). If you use JACK, it will callback into your client every time there is an xrun reported by the audio hardware driver. This in turn has a quite simple definition: user space has not advanced the relevant buffer pointer before the next interrupt. There are circumstances where this actually isn't a problem (because the data has already been handled), but it is a fairly solid way of knowing whether the software is keeping up with the hardware. Something using ALSA directly can determine this in the same way that JACK does.

For audio, there is no other measurement of this that really matters. Using some sort of system clock to try to check timing, while likely to be kinda-sorta accurate enough, ignores the fact that the only clock that matters is the sample clock. If you're operating with huge margins of safety, some other clock measurimg time is good enough, but as you begin to inch closer to problem territory, it really isn't. For reference, we generally find that when "CPU loads" (variously measured) get close to 80% on macOS and Linux, scheduling deadlines start failing.

Nothing on linux will automatically "fallback" to less demanding latency requirements. If the system can't meet the requirements of the audio interface, it will continue to fail. This is actually part of the reason why Ardour tends not to deactivate plugins - the user can expect the DSP/CPU load to be more or less constant no matter what they do, rather than being low and then climbing through a threshold that causes problems as they do stuff.


If you want to check, I have a small (and not very good tbh) benchmarking chapter in my thesis where I detail the steps I have followed to get fairly low scheduler tick times of my software - for simple graphs the best I could do was to go to less than 100 microseconds with ~20 microseconds of jitter (p.187) for a tick:

https://tel.archives-ouvertes.fr/tel-01947309/document ;

also see the intro of section 10.4 with a couple relevant references. Of course that only benchmarks the part that you as a developer can have a meaningful impact on, not the time spent in JACK / ALSA / ...

To give an anecdote, with the hardware mentioned at the beginning of said chapter I can reliably put JACK in a 16 samples buffer at 44100 (~0.7 ms) and do a few things without crackles. not a lot though :p


What audio hardware are you using that allows 16 samples per buffer? Must be something ancient, I think?


it is my pci-express multiface II, that I will cherish until death - it has been rocking hard for the last 11 years so far :-) so I'm hopeful I can get a bit more out of it !




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: