Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Unikernels: The Next Stage of Linux’s Dominance (2019) (acm.org)
247 points by todsacerdoti on May 16, 2020 | hide | past | favorite | 180 comments


Seems to me that a unikernel database should be the first application. Databases tend to bypass practically all the facilities of a kernel anyway. It’s often surprised me that they haven’t merged before now.


I think this is the paper https://www.cs.bu.edu/~jappavoo/Resources/Papers/unikernel-h.... There's a great summary by cormacrelf below of some of the details. This kind of things comes and goes across the decades. Back in the 90s it was common for databases to implement their own file systems or memory management, but gradually the OSs of the day added features make this unnecessary. As we found then, you lose so much flexibility to modify the os via config changes, adding new monitoring or whatever if you had the db directly connected to the os. I'd hate to give that up for more efficiency. There's a lot of 'strength' in an OS separating various things from the database itself, giving that flexibility.

A modern DB needs efficient interaction with the OS of course, but I'd say based on experience at many companies a bigger challenge than raw efficiency is the ability to implement changes in the db itself. I've worked on 4 or 5 major databases and we can always identify many ways to improve execution, plan selection, various aspects of the db, it's just the giant challenge to alter a big code base that blocks improvements more than OS layers. Improving plan choice can make queries thousands of times faster - but you have to be able to implement it.


  Back in the 90s it was common for databases to implement their own file systems or memory management
Informix in the mid/late 1980s initially had Shared Memory implementations in UNIX platforms that supported it. Then, the Turbo/OnLine/IDS servers added the option of raw disk partition use for database spaces to avoid filesystems altogether (and do raw unbuffered I/O).


They also tend to need to be debugged & administered.


There is plenty of work in this area. The team behind OSv ending up writing https://www.scylladb.com/ because of their earlier interest in databases.


So you're saying unikernel experts ended up writing a non-unikernel database. There's a lesson right there.


I wouldn't draw that conclusion. A) Taking vc money sometimes points you at different types of businesses that you are urged to explore more. B) Just cause you can run a database as a unikernel (and you can) doesn't mean the people who want the other database wants to deploy it as one. In my experience those who buy managed databases generally aren't wanting to do any system administration at all.


https://twitter.com/cpswan/status/971335126566211585?s=20

Maybe not exactly what you're thinking. But somewhat in the same domain.


kOS (k language as a standalone OS) was a project but I guess it got canned. Unfortunate, I would love to have tried out a standalone kdb+/q installation.


> Unikernels have demonstrated enormous advantages over Linux in many important domains

By domains, do they mean that as "actually in use in certain sectors in the industry" or "a prototype has shown that"?

> causing some to propose that the days of Linux's dominance may be coming to an end

Who exactly would make that claim?

So, besides performance: what actual, real-life problems does this solve? I think there is some overlap with containers, and at this point, replacing them will require something a lot better.


> So, besides performance: what actual, real-life problems does this solve?

Well, don't go jumping to "besides performance". Dennard scaling is dead, and the things that can interrupt your program in a full preemptive kernel system are myriad.

Furthermore, even if your application properly manages and produces backpressure, the OS can introduce buffering where you don't want it, often by necessity (to avoid massive context switch costs). Now, if you just want to manage backpressure over the network, an application-hosted networking stack is probably a fine solution, but if you want to accurately translate disk backpressure to network backpressure it gets more complicated.

There are heaps and heaps of ordinary programs written for preemptive systems that would see noticeable, tangible benefits to users if they were run instead on unikernels; but the main thing halting adoption is the inconvenience of adapting programs. I tried getting Capstan/OSv to work the other day and the documentation dragged me through several apparently-outdated methods of achieving the same thing, all of which failed in incomprehensible ways. Tooling for these things could use a lot of work.


So, a niche technology for a handful of HFT firms, betting platforms and hyper-scale cloud vendors. Those already use non-mainstream tech (eg LMAX disruptor) to achieve max performance.

This is all nice and good, but the Unikernel guys claim since 20 years its the next big revolutionary thing. In the real world, the evolutionary approach using containers has turned out to solve the problems most people care about.

Don’t get me wrong, this is fascinating stuff. But there is a difference between „changes mainstream computing“ and „cool tech to shave off another millisecond of latency“.


Unikernels could also be the next phase of cloud orchestration technologies. They would be truly lightweight VMs, and obviate the need for containers (collapse the VM orchestration and container orchestration layers into a single orchestration layer). This would offer better security and better performance without impacting developer/operator friendliness. And a side note about improved performance, in the cloud space, being able to boot in ten ms means you can start a VM in the scope of a single request, which means you can scale up dynamically without keeping a bunch of VMs running idle in the background just in case your traffic spikes. This is real, significant cost savings for cloud providers and customers.


Debugging a single-executable container is quite unpleasant. I assume debugging unikernels or thin VMs (with a kernel and a single program) would be a similar experience.


Yep dev tooling matters. Containers are already annoying, but Unikernels are a different universe, and it is not going to be pleasant.


Depends on what is meant by debugging. If you're talking about running a debugger on your program, you can attach GDB to your VM any time you want.


Could you not do all your debugging and development on a normal host and only deploy on a unikernel when you are stable? There is already a divide between development and production with debug and release builds the later of which are a massive pain to debug.


> debugging and development on a normal host and only deploy on a unikernel when you are stable?

Sure! And, still, when you deploy, it fails. At least one of your assumptions that led to your conclusion that your development/testing environment is equivalent to the production one are flawed. Find which ones.


That is not a new problem. To some extent it already happens with debug and release builds, with running on a VM vs a real host, running on a different distribution than the one you developed in, etc. If the performance gains are significant in some domains that would be more than enough to pay for the extra pain.


> running on a VM vs a real host, running on a different distribution than the one you developed in

Except that, in those cases, you can connect to the server and spin up a shell and actually get an idea, from the file system, from logs or from any of the many ways you have to inspect a system. All that you'll need to provide to your unikernel.


Not much of a problem in theory; just include the relevant library.


You can compile your debugging utilities into a unikernel for prod or debug builds (or not at all) depending on your development preferences, performance requirements, and security requirements.


> being able to boot in ten ms means you can start a VM in the scope of a single request

That is actually a good point. Though this is also achievable with WebAssembly (see CloudFlare), but Unikernels could do it more generally I suppose.

On the other hand, AWS Lambda has yet to drive classic EC2 to extinction.


Yeah, webassembly workers is an exciting idea as well. Unikernels is similar, but it removes a couple of layers of abstraction (another way to cut layers would be to implement a low-level hypervisor that runs WASM).

As for lambda not driving EC2 to extinction, these are different technologies for different use cases, and lambda is a relatively new technology—it takes a long time for big companies to change their ways, and it rarely makes sense to port a stable, legacy EC2 code base to functions. There are still plenty of mainframes kicking around for the same reasons.


It’s much easier to tweak a normal kernel (or just use whatever is already there, eg real-time scheduling to get rid of preemption) than move to a unikernel.


I kinda disagree. We looked at doing this and just couldn't fathom how it could be done. You might be surprised at the amount of code and entanglement there is to support multiple users, multiple processes - that touches everything from IPC/shared memory to address space to permissions. Very large deep cuts would have to be made and then you'd need to ensure your patchset stays congruent with upstream.

The other thing that people will look at is doing something like alpine && a heavy-handed seccomp/apparmor but then you're not really doing any cuts at all.


How does the mere existence of that code hurt your workload?


If you get a chance let me know what you think of our take on https://ops.city && https://github.com/nanovms/nanos .


I'm interested to learn why unikernels would be better at handling backpressure. Because there are fewer sources of interrupts that might interfere with each other?


Numerous interrupts may have an impact on latency, but for backpressure, buffering in the network layer or the filesystem/block layer can hide the limitations of the system from your application, and when you hit those limits, the hit is harder and can cause oscillations throughout a network or distributed system.

Now, of course, applications themselves can fail at this too. Look at Postgres, maybe, where typically people rely on [AUTO]VACUUM, but if your application runs continuously, then there is no right time to run it.


> Who exactly would make that claim?

Didn't you hear? Some.


Top men. Top. Men.


Top guns it's almost scient-a-logical


Security (general lack of user-land and in-ability to run more than one program), server density (run thousands of vms/box).


That does very little for security, if it's not a net negative at all.

Most attacks aim at executing arbitrary code in the process space of the application being targeted. The lack of user-land does not help.



Who exactly would make that claim?

Will Serverless End the Dominance of Linux in the Cloud? https://dl.acm.org/doi/pdf/10.1145/3102980.3103008


"Also known as rump kernels a name inspired by the infamous purge of royalists from Parliament following the English Civil War this process involves creating a fork of an existing kernel codebase and manually purging it of the components deemed unnecessary to the target unikernel."

That appears to come from the FAQ on Github for "rumprun" dated 2015.

Rump kernels were introduced in NetBSD in 2009. It was an described then as an acronym for runnable userspace meta programs.

https://blog.netbsd.org/tnf/entry/runnable_userspace_meta_pr...

I have not used rumprun for Linux but I still use the rump utilities included with NetBSD.


I'm one of the authors on this paper, so ask away if you have questions.


Correct me if I am wrong, Unikernels are basically allowing binary to run at Ring 0?

What do you think about WASM in the kernel with the same high level concept with Unikernel? link in the below

https://destroyallsoftware.com/talks/the-birth-and-death-of-...


I'm not sure I can say anything intelligent about WASM in the regular kernel, but from a UKL point of view we're talking with Enarx and they're going to join us on one of our regular meetings in a few weeks. For more info on Enarx and web assembly see: https://www.youtube.com/watch?v=5wrQSe-IdMI


I'll admit I don't know much about WASM but access to raw sockets and 32bit only sounds extremely limiting.

Until those are dealt with wasm will never move past it's current status quo - malware in the browser.


Really cool work!

Is there really a support layer for Emacs in the kernel?

Now the real question. You are statically linking against a modified glibc, so they make calls to your kernel library instead to the system call and you can do some nice optimizations along the way. Which I think make sense.

I can see this is cool for things that directly interface to glibc, but do you think this is viable for applications using multiple layers between the kernel and the application?


At the moment you must go through glibc. For example issuing SYSENTER or INT 80h will not work.

We have only run C applications so far, so I'm unsure how it will all work for servers written in other programming languages. I think applications which rely on complex multi-process middleware (like Java JBoss, etc) will be hard or even impossible to port.


You can always "unikernelize" the JVM. OSv had a kernel-optimized JVM for example.


That’d be great for many VMs. I think it’d be interesting to see how the Erlang BEAM would perform in such a scenario. Especially for embedded devices! Yes, sometimes a few dozen micros faster is all that’s needed for many applications to become practical.



Understandable, thanks for your answer!


isn't it hard to develop/debug/monitor an application with this approach? I mean you can't just run a debugger as another process and attach to the thing, you can't run tcpdump or strace, nothing of that sort. Also every bad pointer access will require a reboot, wouldn't it? I mean how do you develop an application with this approach?


All good questions without a clear answer at the moment. Bad pointers in the application can overwrite kernel data structures because everything runs in a single address space.


The unikernel idea is 20-ish years old. I think those good questions cannot be much younger.

Because of that, I think “without a clear answer at the moment” is worrisome. Are there partial answers to these questions?


I mean "without a clear answer" for UKL which is only just over a year old and still in active and early development. There will probably be a way to attach gdb at some point.


> Also every bad pointer access will require a reboot, wouldn't it?

Hm, isn’t reboot just slightly more complicated than restarting your application? And most of the today’s apps would still require “rebooting” be it a container or a virtual machine


Containers get "rebooted" when they go wrong too, so I'm not sure this is so different.


You can debug a normal kernel over the network. I imagine you can do the same with a unikernel.


I guess you would use an emulator to debug. Which seems like good workflow design to me.


One of our engineers spent yesterday debugging a networking issue on Google Cloud for Nanos.

Not saying it is easy but at the end of the day it's just engineering work.


Probably best to use a memory safe language so you don’t have bad pointer accesses and generally minimize debugging significantly.


What is a unikernel and why it's good?


Basically you replace the system calls into the kernel with library calls. So you end up with a single binary that contains both the application as well as the operating system functionality.

This can lead to some rather spectacular performance benefits. One thing is that you remove the syscall overhead, but more importantly you can do a lot more optimizations at the compiler/linking steps.

You can remove complex code that tries to dynamically model the world with rather plain code that does just what it is supposed to. With IncludeOS (C++17 unikernel) we did make a firewall implementation that instead of using complex data structures relied on preprocessing the rule set and translate it into C++ code. The code to do this was quite simple and resulted in pretty amazing figures wrt number of rules per packet per second. Metaprogramming can sometimes deliver amazing results.

Similar things can be done in a traditional kernel as well and we've seen eBPF firewall implementations that are able to get similar numbers. On the flip side eBPF and it's firewall implementation is order of magnitude more complex. However, you can push eBPF code onto the networking card which can be a huge benefit. And ofc you could make a kernel module that implements a specific firewall ruleset and get similar performance without eBPF.

I like unikernels because I feel I can understand them better. You can reason about them and they are easier to optimize for very specific workloads.

So why aren't we using them? Likely because we know Linux so well and we've invested a lot in it. It runs everywhere and is extremely well supported. Bringing a new operating system to market is challenging, bringing a new operating system with a entirely different architecture is even harder. Even if it, on a purely technical level, could yield better results.


Another performance benefit (unsurprising in highsight) comes from removing KPTI and similar Spectre mitigations. These have really made system calls in regular kernels expensive.


On the other hand if you run only one application on your VM you can just disable them, because one you have access to the userspace application there is nothing else on the machine to steal via side channel attacks.


Absolutely. As of course you know but for the benefit of others reading this, Red Hat [we both work there] is concurrently looking at many different approaches (straight containers, Kata, restructuring QEMU, optimizing the kernel, unikernel Linux, and probably half a dozen others I've forgotten). I doubt that any single approach will be always better.


Bringing a new operating system to market is challenging, bringing a new operating system with a entirely different architecture is even harder.

Being actually different is one of the few things that gives someone a reason to look twice at the new system.


Yeah I’m kind of amazed they don’t start out with explaining this. Usually there is at least a one-line explanation.

From Wikipedia:

“A unikernel is a specialised, single address space machine image constructed by using library operating systems. A developer selects, from a modular stack, the minimal set of libraries which correspond to the OS constructs required for their application to run. These libraries are then compiled with the application and configuration code to build sealed, fixed-purpose images (unikernels) which run directly on a hypervisor or hardware without an intervening OS such as Linux or Windows.”


> There has been a resurgence of research sys-tems exploring the idea of the libraryOS, ora unikernel,a model where target application is linked with a special-ized kernel and deployed directly on hardware, virtual orphysical

Third paragraph

> The unikernel is a cloud-era handle for the classic systemstechnique of linking an application with a library of oper-ating system components (including memory management,scheduler, network stack and device drivers) into a singleflat address space, creating standalone binary image that isbootable directly on (virtual) hardware [22]. The advantageof this approach is that kernel functionality can be special-ized to fit the needs of the target application to increasethe performance of the application or to support it within ahighly restricted execution domain.

And second chapter.


https://en.wikipedia.org/wiki/Unikernel

Whether it's good depends on what you're doing. UKL will[1] allow you to link a regular server application to Linux and then run that single binary on baremetal or in a hypervisor, and give you a decent performance boost over running the application on top of a normal kernel. How much of a performance boost depends a great deal on the program, almost everything will be a few percent faster, and some applications a lot more.

[1] I'm using the future tense here because although we do run real programs with it today there's still a bunch of development work to do before it works smoothly. The code is: https://github.com/unikernelLinux


As I've always understood it, it's not just performance but also security: there's a lot less code running with your code which cuts the surface area for bugs and thus security holes.


Wouldn't a crash or other problem in your program potentially corrupt e.g. the kernel tcp stack?


As the program and kernel run in a single address space, yes there is no separation from your application corrupting parts of the kernel.

I'm not very convinced about the security story around unikernels, but for balance the other side of the argument is that there's much less code around in a unikernel - no shell, no command line tools at all, no compilers or interpreters, just the code required to run the program and talk to the hardware (real or virtual).


Wouldn’t directly linking system calls make every address space more unique and thus make it harder to write (generic) exploits for?


The typical problem is that you end up distributing (eg) Apache 2.4.99 compiled for Unikernel on x86-64 via Red Hat Network to a million customers and they're all running the same binary. ASLR helps here ...


With full privilege I think you could directly write to hardware dma locations and look them up in the mmu, etc.


> there's a lot less code running with your code

Tell me if I am wrong, but is it always true though? Unikernel libraries definitely are largely not as proven as Linux kernel. Then how is unikernel secure than Linux system call?


Think of it as "Your app runs like an operating system".

Instead of installing an OS and installing your app on that OS, the OS is a library you use to talk to the hardware.

It's good for two reasons.

First security, if you only use the HDD and network, you don't have to include code for display or mouse etc. which could be a security risk.

Second, it's faster, because you don't have to do syscalls, since your app IS the OS, syscalls don't need to get out of userspace.


What do you make of the highly critical Assessing Unikernel Security paper [0]? Have things changed since its publication?

[0] https://www.nccgroup.trust/globalassets/our-research/us/whit... , from https://news.ycombinator.com/item?id=19738905


You might enjoy reading my response to that:

https://nanovms.com/dev/tutorials/assessing-unikernel-securi...


I'll be frank: that article strikes me as weak sauce. It's coy about the identity of the paper it discusses. It's there in the title, and nowhere else, as if that's enough for it to be clear to the reader. Beyond this, the responses offered are not strong.

It makes good sense to compare unikernels against conventional GNU/Linux. It's the standard configuration, and it sets the bar. Describing this as a pyschological hump isn't a sensible response.

I don't know what advances to things like the networking stack is meant to refer to. The article should be clear and explicit about what part of the paper this is responding to. As it stands it looks like a straw-man.

The response to unikernels are un-debuggable makes little sense. Someone can work on a debugger while claiming the current state of debugging is poor. No contradiction there.

Downplaying the significance of Data Execution Prevention isn't a convincing defence. Downplaying the significance of ASLR isn't a convincing defence. These are valuable security measures.

This excerpt from the conclusion seems to indicate a reluctance to take security concerns seriously at all, as if a security analysis is an emotional attack: there's a thousand little kids out there wanting to crap on anything they can. Security bugs are bugs at the end of the day and all one has to do is look at any random popular github to understand how many bugs we as humans create.


I can't read that paper. Firefox says the HSTS cert is wrong.

Edit: Thanks for the updated link. UKL can be compiled with usual hardening features like ASLR, stack hardening, RELRO.

As I said here (https://news.ycombinator.com/item?id=23202042) I'm dubious about the security story around unikernels, but most of the issues in this paper are not relevant to UKL specifically.


Very strange, it works fine for me regardless of browser.

Mirror: https://web.archive.org/web/20200311095425/https://www.nccgr...


you wrote this paper but can’t figure out how to solve this issue? really?


What are the plans regarding multithread / multiprocess support?

On the face of it, multiprocess support might seem a bit stupid, but it would greatly help the porting of more complex applications. The 'processes' wouldn't need virtual address space or memory protection, just give them a block of memory and pick a CPU to run the code.

If you have multithreading, do you need any scheduling code, or can you just pick a CPU for each thread and leave them to run? What about the kernel's own threads & processes?


UKL is multi threaded, not multi process because there is no fork. Plus it's a Unikernel i.e., just one process. You don't need any scheduling code, Linux does it like it does normally for kernel and user threads.


>You don't need any scheduling code, Linux does it like it does normally for kernel and user threads.

That's interesting, how does Linux implement it's kernel threads without any scheduling code?


I don't know where things will go in the future, but now: Multiprocess no. fork() will fail. Pthreads works already.


Is the final vision to somehow put together a packager that will put together an application plus UK together to run on any virtual system. Maybe one can extend packer to create this binary similar to how it creates container.Also FaaS providers like AWS lambda or google functions - aren’t they already doing this?


I wouldn't say the final vision is clear yet, but what I'd like to see would look something like a "liblinux.a", and you would compile your application using your normal compiler, and link it to this library to end up with a vmlinux that would run in a VM or on baremetal.

Actually achieving this is tricky: At the moment you need to use special compiler flags and a completely different link line from normal. Then there's the issue of how/whether the library Linux is configured generically or you somehow allow people to set CONFIG_* options.


Thanks for taking questions. Are there outstanding work items for QEMU being statically linked with a unikernel?


It's not really something that anyone in the project has thought about. Running the vmlinux on baremetal is actually more interesting to us.


Why Linux?

Solutions based on Netbsd's rumpkernels exist. There's nothing about unikernels that says Linux.


Linux is the most mainstream Unix-like server operating system. We're making only minimal changes to Linux and hoping to get them upstream. (The NetBSD rumpkernel is a bit different from this by the way.)


How transparent could the deployment target choice be? Could you deploy roughly the same source code to unikernel and non unikernel targets? Also, what exactly are these enormous performance benefits?


About the source code, all I can really say is that's our aim at the moment - to have the same source compiled for both non-unikernel and UKL targets. Note that really what we're aiming for is to recompile common servers out of the box with no changes, or the minimal changes possible.

Performance benefits: In truth we're still measuring this, there is a very smart student at BU who is investigating this and coming up with results for a forthcoming paper which I don't want to preempt too much. Almost all applications should see a small benefit, say of the order of 5%, but it's hoped that some will see a much larger benefit, and that it should be possible to make modest modifications in order to enjoy large benefits (perhaps only compiler and linker flags and such like).

These are very much hopes rather than firm promises at the moment.


Is it correct to think of it as application code running as PID 1 in kernel space? Can I fork or create threads? Can I directly access kernel's internal APIs like a kernel module?


> Is it correct to think of it as application code running as PID 1 in kernel space?

getpid exists and returns a number - in fact it can return any number you like :-) Everything is linked into a single vmlinux which runs in a single address space. (As an aside I'll have to remember to ask Ali if he's thought about what number getpid() should return in UKL - I guess returning 1 would cause least surprise)

> Can I fork or create threads?

Fork no, threads yes. Threads are implemented using kernel threads.

> Can I directly access kernel's internal APIs like a kernel module?

Yes, although whether this is a good idea is another matter. Note that it's still Linux so internal APIs are unstable and can go away or change their meaning at any time, so if your application starts to rely on internal APIs you could quickly get into trouble.


Slight clarification. Yes, UKL has normal kthreads, but all the application code runs using pthreads. And their implementation is unchanged, except the fact that they do function calls instead of syscall.


Not quite. Everything runs as PID 0. :-)

There are no processes here. Everything runs inside the kernel. You don't have exec(), fork() or similar.


Is this just bare metal embedded development, with a nice HAL?


We're hoping that many regular server-type applications can be ported simply by recompiling them.


Only reading the abstract, are you saying a different branch/fork of Linux could be maintained to be a unikernel?


(Not the author) No, the paper describes a small patch to Linux (~20 lines) + modifications to glibc that enable using Linux as a unikernel. Their goal is to have the modifications upstreamed so that Unikernel linux could be a GCC target etc.


I also don't get it. "Linux as a unikernel" does not make sense. The entire point of a unikernel is to not have the OS in every container/VM.


The crucial part of the paper is that they:

• Added a new kernel configuration option to allow theuser to select if he/she wants to compile the Linux kernel as UKL.

• Added a call to an undefined symbol (protected by an #ifdef) that can be used to invoke application code rather than creating the first userspace process.

• Created a small UKL library which has stubs for syscalls. These stubs hide the details of invoking the required kernel functionality now that the regular interface (i.e.,the syscall instruction) is no longer used.

• Changed glibc so that instead of making syscalls into the kernel, it makes function calls into UKL library.

• Changed the kernel linker script to define new segments such as thread local storage (TLS) segments which are present in application ELF binaries.

• Added a small amount of initialization code before invoking the application to replace initialization normally done by user level code, e.g., for network inter-face initialization.

• Modified the kernel linking stage to include the application code, glibc and UKL library to create a single binary.

Basically, I think the idea is you get all the linux syscalls (+ filesystems... + network stack... + hardware support... + everything) for 'free'. Sure you'll have a pretty large binary, but you won't have to write anything.


Thank you! That great summary should be the top post.


I did copy the dot points directly from the paper. It was behind a university login for me, so I wasn’t sure anyone could read it.


If you statically link your software with the kernel, presumably an optimizing compiler will be able to remove most of the kernel from the resulting binary.


Yes. Also LTO becomes possible.


For those, like me, who don't know what the acronym LTO is: it's Link-time optimisation, the stage during compilation where the "linker" has all of the object files which make up your program available to it, and is therefor able to optimise across the entire program at once (instead of the individual constituent source files which make up the program).

There's a long, detailed PDF about LTO produced by GCC which was useful in helping me to understand exactly how this works here: https://gcc.gnu.org/projects/lto/lto.pdf


The changes to Linux are fairly small and the aim is to get them upstream. There's also a forked glibc and the changes there are a bit larger, although also hopefully upstreamable.


Unikernels are typically statically linked. Using copyleft code means the whole resulting binary is subject to copyleft.

Free software is great, but not everyone is in the position where they can release all of their code all the time.

For a unikernel to be viable it can't have copyleft code in it.


Depends what you're using it for. If you're just running it as a server then (non-A) GPL code is fine since you're not distributing the program.


But what about the cloud and auto scaling wouldn't that count as distribution (through for internal use only)?


The GPL is all about giving rights to individual users of your program, not the world in general. It does so by turning copyright upside down, instead of restricting rights it grants rights (it’s like the opposite of those FBI warnings you get on DVD-discs) to those, and only those, who you give a copy of the program to.

But that is all. It works exactly like copyright in that regard. To be subject to it you need to have received a copy of program.

So no, running the program in the cloud is not distribution.


It's an interesting point. I wonder if sending a binary to AWS counts as distribution to a third party?

(Of course I work for Red Hat and the cult thinks all code should be free :-)


Yes all code should be free!

(I work for Red Hat also :-)

I was originally just being silly but figured I would add something of substance here. I think the GPL goes a little too far but I like the spirit of it. As a user if I pay for an application, I should get a copy of the code for personal use as well. I don't think that gives me the right to distribute it because it's still somebody's property, but applications should be distributed to customers with their source IMHO.


Internal use is not distributing.


> Using copyleft code means the whole resulting binary is subject to copyleft.

No. That's formally wrong, and it doesn't imply what you think it implies.

Using copyleft code means that if you distribute the resulting binary, you also have to distribute the corresponding source code. If you can't do that, you can't distribute the binary.

Is that a problem? No. You distribute object files and libraries, accompany the copylefted libraries with their source code, and let the end user link them together statically. Problem solved. Arguably, that's how all software installation on Unix should work; it solves so many other problems, too.


You can write proprietary kernel drivers. Why can't you link user code with kernel? User code usually depends on POSIX interface and Linux is just one of the implementations.


> You can write proprietary kernel drivers.

IIRC you can write proprietary kernel modules, which are loaded at runtime.

> Why can't you link user code with kernel? User code usually depends on POSIX interface and Linux is just one of the implementations.

Because the kernel is licensed under GPLv2. If you statically link to the kernel, then all interpretations of the GPL are that your work is a derivation and must be GPL'd (at least assuming it is "distributed").


Linux is licensed under GPL-2.0 WITH Linux-syscall-note.

>NOTE! This copyright does not cover user programs that use kernel services by normal system calls - this is merely considered normal use of the kernel, and does not fall under the heading of "derived work".

(even without actual syscall, you can consider unikernels as normal use of kernel)


The original comment specifically put forward the following scenario:

> Unikernels are typically statically linked.

The note you're quoting is about regular software performing regular syscalls which is runtime dynamic linking to the kernel (so much so that multiple OS (e.g. SmartOS, Windows) implemented linux persona which allow running unmodified Linux software against a non-linux kernel).

It does not apply in a scenario where you would statically link to the kernel.


> It does not apply in a scenario where you would statically link to the kernel.

That's not the interpretation everyone else uses.


That's completely irrelevant to this discussion.

It's a well-defined axiom of the original comment, and we're discussing logical consequences of that axiom. If you want to disagree with the original axiom then go and do so to the original comment.


So if your "userspace" payload is dynamically linked and loaded at runtime, it's no longer a problem? I don't think that there is a big performance hit for dynamic loading. Probably something like LTO won't work.


> So if your "userspace" payload is dynamically linked and loaded at runtime, it's no longer a problem?

Indeed but then you’re kinda missing the point of unikernels.


I thought that main point of unikernels is to avoid cost of kernelspace-userspace switch.


Well yes, if you want to distribute binaries to people and keep the source private (like in 90's), copyleft unikernels isn't the way to do that.

Unikernels are viable even if they don't allow you to do that, for many reasons - performance, workload isolation, ease of installation.


This is not categorically true.

Linux is licensed under GPL-2.0 WITH Linux-syscall-note.

>NOTE! This copyright does not cover user programs that use kernel services by normal system calls - this is merely considered normal use of the kernel, and does not fall under the heading of "derived work".

More generally can link LGPL code statically without making resulting binary subject to copyleft or releasing the source code. Increasing number of programmers don't know what object file is anymore (they didn't learn to program in C/C++) and this causes endless confusion.

https://www.gnu.org/licenses/gpl-faq.html#LGPLStaticVsDynami...

> For the purpose of complying with the LGPL (any extant version: v2, v2.1 or v3):

> (1) If you statically link against an LGPLed library, you must also provide your application in an object (not necessarily source) format, so that a user has the opportunity to modify the library and relink the application.


> This is not categorically true.

GP provided a clear scenario in which their warning is categorically true.

> Linux is licensed under GPL-2.0 WITH Linux-syscall-note.

The note clarifies that syscalls are interpreted as a form of dynamic linking, which would be quite obvious in retrospect given you can run unmodified linux software on non-linux kernels (https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux).

> More generally can link LGPL

You do realise the LGPL and the GPL are different licenses with different requirements and constraints right? And the linux kernel is not licensed under the LGPL? And that what you quote specifically notes that even under the LGPL, static linking results in a derived work?


> And that what you quote specifically notes that even under the LGPL, static linking results in a derived work?

No. You can provide object files without source code and it does not become covered work. This is for example how using LGPL in Qt works.


[flagged]


> What prevent you from releasing code "all the time"?

That you don't have the rights to it, and / or that the code was written under a contract or grant which prevents its release.


> What prevent you from releasing code "all the time"?

Sometimes the same thing that prevents you from giving away all the results of your work for free. Can I have a copy of the code you're writing for your employer now? Surely everything you work on is made freely available to the public in its entirety. I hope you weren't just virtue signaling but doing the opposite in exchange for a salary.


Very often corporate wants to keep the application code private.


Show me where Microsoft releases all of the source for Windows, or Apple, for Mac OS and iOS.

What's preventing them from releasing code "all the time"?


Can a unikernel be used to create a sound/music application which is not afflicted by the latency problems which bedevil such applications when running inside traditional operating systems?

For example, could it be used for a softsynth which takes in midi input and emits digital audio with the OS only contributing guaranteed sub-millisecond latency?


is that not what the linux realtime patches do? https://wiki.archlinux.org/index.php/Realtime_kernel_patchse...


I've been researching this for a long time, and it's unclear to me whether Linux with PREEMPT_RT patches would meet that sub-ms requirement. I see numbers all over the place from various sources, from sub-millisecond to more than 10 ms.

10 milliseconds is often considered the threshold of human perception with regards to audio latency. (It's possible that it's lower under some circumstances, but 10 ms is a good enough approximation for my purposes.) My goal is to create a sound/music tool which runs lots of DSP and which has a total system latency of under 10 milliseconds, including OS, application (including intrinsic latency of DSP procedures), hardware, and sound in air (about 1 ms per 3 meters).

When I read about latency, I often see "we're at 6-9 ms, that's good enough because it's not perceptible". Unfortunately, that's not good enough if there are several components which contribute to total system latency and they are all pushing 10 ms. Hence, my sub-ms requirement for the OS.

Committing to a platform will be a costly choice. I don't want to invest in writing for realtime Linux only to find that I really need to run a hardcore RTOS, or run dedicated DSP chips, etc.

That's why I'm interested in whether running a unikernel can offer stronger guarantees.


I write realtime audio software for Linux, and have done so for 20 years.

While your goal is admirable - it certainly is possible to come up with scenarios where "sub-ms" latency is desirable - it's really not relevant.

Your stated goal ("...lots of DSP ... under 10 msec") is already entirely achievable on Linux, assuming you're close enough to your speakers (or wearing headphones).

But sub-msec can only make sense here if it describes scheduler latency, since there's no audio hardware that can function in the sub-msec range. The linux scheduler is way, way below that threshold and SCHED_FIFO threads will see that performance barring hardware issues (c.f. https://manual.ardour.org/setting-up-your-system/the-right-c...)

Finally .... writing audio software is a lot of fun. But please don't just jump in without seeing if you can instead contribute to an existing project first. The field is littered with the dead and half-dead corpses representing the discarded work of developers who thought it would be fun, and then moved on.


I'm curious what you think of PipeWire:

https://gitlab.freedesktop.org/pipewire/pipewire/-/wikis/Per...

It claims to get near Jack latency, while implementing a more generic audio graph with better security than PulseAudio.

In regards to Linux scheduler performance, do you have any experience with SCHED_DEADLINE?


We don't use SCHED_DEADLINE with audio, because things are driven by hardware interrupts/requirements, which makes SCHED_FIFO more appropriate ("wakeup! time do the work! right now! until it's done!"). macOS doesn't really have SCHED_FIFO, so we end up forced to use something to deadline scheduling there. It works fine.

I commented a lot on PipeWire when it started. Thankfully Robin Gareus and others managed to get its developer to change course somewhere along the journey to use a better basic model (notably, pull not push). I no longer follow its development. If it works, it will be good for Linux audio. I don't know if it will, or won't.

[for anyone who doesn't know, I'm the original author of JACK]


Hey, thanks for all your work.


As for contributing to existing projects, I hear you — I've been highly active in Open Source for well over a decade. Conceptually, the closest thing to what I want to do is Pure Data — and I have in fact contributed to it.

But I'm also extremely motivated and willing to go all the way down and write the entire thing from scratch if I have to. Or to learn enough about every last step in the chain that I can actually control for latency — and it may be that it takes about the same amount of work. Finding numbers I can trust on latency seems hopeless. Everybody fudges rather than fails.


The word "latency" covers many things, many subtly different from each other. One of its meanings, unrelated to the one you're talking about, is the delay in signal flow caused by algorithms (many digital filters, for example). Often called "plugin delay compensation", or more generically "latency compensation".

Let me just point out that it has taken 20 years and a guy whose PhD thesis was about latency compensation in a DAW to finally "fix" this sort of "latency" in Ardour. This is a massively harder problem from a design perspective than the scheduling latency issues you've referred to.

[ EDIT: to be fair, the actual correct solution to latency compensation didn't take 20 years to implement, more like a year or so when taking place within the context of a large existing code base. ]


The complexity of solving such problems generally is actually what drives me to start from scratch: rather than solve an intractable general problem, instead limit the scope of where the project must run and what it must do.

My perception is that I am not capable of reliably tuning a general purpose OS for a total system latency of under 10 msec. I can't give the total system a hard number and believe that it will obey; instead, I need to perform a lot of esoteric tweakery of subsystems I probably don't understand. The system won't alert me reliably when it fails, but will instead either drop out or just give me more latency than I asked for — and there are innumerable factors outside my control that could cause it to fail.

However, the composition tool I want to create has a pretty small set of requirements, if I accept that it only need serve my particular use case. So how about I ensure that my app is the only thing running on the hardware, via unikernel, or RTOS, or even bare metal?

Implementing all of my compositional requirements is probably easier and certainly more rewarding than tweaking latency parameters without having confidence that my results will be enduring or predictable.

If it turns out that the knowledge I gain during that exercise allows me to control latency well enough and I can return to mainline operating systems and contribute to existing projects, all the better. I don't really want to go down this path, but I'm not willing to accept a tool that maybe-kinda-sorta-sometimes meets my absolute requirements.


You still won't be safe against several/most of the causes outlined in the page from the Ardour manual that I linked to.

SMIs, random hardware that locks the bus for too long ... out of the kernel's control.

If you really want 10msec finger->ear (a goal that I would say is reasonable, though given many live performer's and mixing engineer's normal physical setups, probably excessive) and you want to guarantee it, it's not just the OS but the hardware you'll have to change. You cannot guarantee the required scheduling on general purpose (intel/amd) motherboards unless you take great care with the selection, and even then ... I've heard that financial services/investment firms are the main reason you can still buy mobos without SMIs, because their "latency" requirements would be broken by these interrupts.

On the other hand, the "not-guaranteed" case with a reasonable mobo, sensible bus connected devices, an RT kernel and properly written user space software is going to work almost all of the time. Just no guarantees.


Paul,

Thank you very much — for your ongoing work in Open Source audio, for being willing to engage at length in this thread, and for being straightforward about what the system can deliver.

> If you really want 10msec finger->ear (a goal that I would say is reasonable, though given many live performer's and mixing engineer's normal physical setups, probably excessive)

I worked in a recording studio for 6 years, including two years as a mastering engineer. Most of the sonic adjustments I would make during mastering and mixing fell below the threshold of perception — but when added together they would produce something well above the threshold of perception.

There's nothing magic about this. I don't have "golden ears" (although because I've trained I can more quickly identify certain patterns than people who haven't trained).

The point is simply that an aggregation of imperceptible changes can sum to a perceptible result. It's akin to why you perform intermediate processing in both video and audio at a higher resolution than the final delivery medium: otherwise an accumulation of small, possibly imperceptable degradations will cause perceivable degradation of the finished product.

And so, I dispute the idea that just because there are other sources of latency, we should resign ourselves and accept substantial contributors to latency which fall below perceptual threshold. The only number that matters is the final sum of all latencies.

> You cannot guarantee the required scheduling on general purpose (intel/amd) motherboards unless you take great care with the selection, and even then ... I've heard that financial services/investment firms are the main reason you can still buy mobos without SMIs, because their "latency" requirements would be broken by these interrupts.

With this in mind, I will set aside one possibity I'd considered: writing for general purpose CPUs (e.g. multicore x86_64) outside of mainstream operating systems.

Instead, while I'll continue prototyping the project on mainstream operating systems, I'll probably look more deeply into dedicated outboard DSP boards.

> *On the other hand, the "not-guaranteed" case with a reasonable mobo, sensible bus connected devices, an RT kernel and properly written user space software is going to work almost all of the time. Just no guarantees.

I appreciate how hard you've worked to achieve that.

My question, then, is how can I be confident that I'm actually meeting these "almost-all-of-time-time" latency requirements?

In my experience, most systems recommended that you lower the latency until you hear clicks and pops. That convention leaves me... dissatisfied. A dropout is a detectable event, and the monitoring system should surface it.

Just as boggling when you have exacting standards is when the system falls back and delivers something subtly degraded without telling you, like changing the latency without warning because the system would otherwise go down. I understand why systems are designed to prefer degradation over failure, but for my purposes I need to know when it happens. Expecting me to monitor continuously for an effect which is at the threshold of perception, such as subtly increased latency, is draining — and ultimately unreasonable.

We have meters for noise floors and red lights indicating that clipping occurred. What facilities exist to help me understand when the latency behaviors of my rig are not meeting my requirements?


By "probably excessive" I wasn't referring to psycho-acoustics. I merely meant that given your willingess to include speaker->ear latency, many people work on music in scenarios where that measure alone is already close to or above 10msec. The worst case scenario is likely a pipe organ player, who deals with latencies measured in units of seconds. Humans can deal with this without much difficulty - it is jitter that makes it hard (or impossible) to perform, not latency. Long-standing drummer & bass player duos generally report being able to deal with about 10 msec when performing live.

On the flip side, you have people arguing convincingly the comb filtering caused by phased reflections inside almost every listening scenario are responsible for the overwhelming majority of what people as "different". Move your head 1ft ... lose entire frequency bands ... move it again, get them all back and them some!

Regarding latency deadlines: well, the device driver can tell you (and does, if you ask it). If you use JACK, it will callback into your client every time there is an xrun reported by the audio hardware driver. This in turn has a quite simple definition: user space has not advanced the relevant buffer pointer before the next interrupt. There are circumstances where this actually isn't a problem (because the data has already been handled), but it is a fairly solid way of knowing whether the software is keeping up with the hardware. Something using ALSA directly can determine this in the same way that JACK does.

For audio, there is no other measurement of this that really matters. Using some sort of system clock to try to check timing, while likely to be kinda-sorta accurate enough, ignores the fact that the only clock that matters is the sample clock. If you're operating with huge margins of safety, some other clock measurimg time is good enough, but as you begin to inch closer to problem territory, it really isn't. For reference, we generally find that when "CPU loads" (variously measured) get close to 80% on macOS and Linux, scheduling deadlines start failing.

Nothing on linux will automatically "fallback" to less demanding latency requirements. If the system can't meet the requirements of the audio interface, it will continue to fail. This is actually part of the reason why Ardour tends not to deactivate plugins - the user can expect the DSP/CPU load to be more or less constant no matter what they do, rather than being low and then climbing through a threshold that causes problems as they do stuff.


If you want to check, I have a small (and not very good tbh) benchmarking chapter in my thesis where I detail the steps I have followed to get fairly low scheduler tick times of my software - for simple graphs the best I could do was to go to less than 100 microseconds with ~20 microseconds of jitter (p.187) for a tick:

https://tel.archives-ouvertes.fr/tel-01947309/document ;

also see the intro of section 10.4 with a couple relevant references. Of course that only benchmarks the part that you as a developer can have a meaningful impact on, not the time spent in JACK / ALSA / ...

To give an anecdote, with the hardware mentioned at the beginning of said chapter I can reliably put JACK in a 16 samples buffer at 44100 (~0.7 ms) and do a few things without crackles. not a lot though :p


What audio hardware are you using that allows 16 samples per buffer? Must be something ancient, I think?


it is my pci-express multiface II, that I will cherish until death - it has been rocking hard for the last 11 years so far :-) so I'm hopeful I can get a bit more out of it !


The site has just the right number of modal windows hiding the content: a cookie consent form, a covid 19 information at the bottom, and some recommendations on the right, and if scroll back up the site banner...


https://www.researchgate.net/publication/332329656_Unikernel...

researchgate is much much nicer, and gives a lot of info: citations, related papers, you could go see info about authors, ask questions, see what project this paper is on. Researchgate allows discovering interesting papers in a seamless maner.


The old dl.acm was no amazing bit of modern design, but it was definitely much better than this mess.

Yet again the ACM showing how out-of-touch it can be with its members and patrons.


Eh, sounds like every other website to me.

Wouldn't people say ACM are out of touch if they don't adopt all the common junk you see on websites? Their website is so out of date it doesn't have a cookie banner! Best I get my information from somewhere that keeps up with the times ... and the law!


My gut instinct is always that using traditional kernels in a unikernel way is a bit suboptimal because it doesn’t become a “library operating system” in the same way that Mirage does.


Only the bits of Linux which are used are linked in, same as when you link together any program. The big advantage of using Linux is driver support - you can run a UKL application on baremetal, linking to the drivers needed to run on the target hardware.


Oh! That’s very cool


While that's true, the flip side is that "traditional" kernels have much better hardware (and general) support because they benefit from the long history of the "main project".

Stripping down traditional kernels to work as unikernels is not a new thing: https://github.com/rumpkernel/rumprun


Does it really matter ? Are people actually planning of running unikernel on bare metal a situation where you would need significant tooling to manage them ?

MirageOS was initially clearly designed to run on top of an hypervisor. The idea was going from host os, guest os and application to just host os and unikernel.

As I see it, it was a different way of solving the same problem containers are now used for : do you really need both a host os and guest os when all you want is isolation ? Using container is pushing isolation in the kernel while losing the actual virtualization (in a way merging host os and guest os) while unikernel is pushing the useful part of the guest os into the application (merging guest os and application) keeping the benefits of full virtualization. I think it's why Docker bought the company making MirageOS.

I don't really see where rump kernels sit there. Clearly there is interest as people are working on them but I fail to see where they would be useful.


When I make a system where a unikernel could do the job, I usually want a full OS for program setup and initialization, and then an isolated core for the main loop, sharing (single-writer) memory pages with other, less performance-critical processes for logging, stats reporting, and any needed file system activities.

The makers of top-performing NICs have been quite good at providing direct user-space access to their hardware, typically by exposing a ring buffer in shared memory, and maybe mapping device registers too, so that the process on the isolated core never does another system call until shutdown weeks later.

It is some hassle to get customers to add boot flags (isolcpus=, nohz_full, rcu_nocbs=, rcu_nocb_poll, hugepages=, etc.), and to put any mapped files in /dev/shm or /dev/hugepages so the kernel won't invent excuses to block the procees, and to direct irqs to other cores; but unikernel setup is probably not simpler.

So, I'm not sure what a unikernel would get me. Portability, or independence from proprietary drivers?


I don't think the tuning you refer to is sufficient to bring the platform noise to levels required for some workloads. You will still see a lot of system call interrupts, TLB shootdowns, timer events, etc. Funnily enough yesterday I published an article tangential to this problem field.

http://bitcharmer.blogspot.com/2020/05/t_84.html

I'm not an expert on unikernels but my assumption is that you will see none of that OS jitter.


If you have only one thread running in the process, memory usage is static, any mapped pages are writable only by the one process and are not backed by external storage, you have turned off timer interrupts and rcu reaps, and do not perform system calls, where are these TLB shootdowns and timer interrupts coming from?


Good writeup, but is there any mitigation for TLS shootdowns?


TLB shootdowns are a product of multithreading, and of unmapping memory. Avoid one, the other, or both, and TLB shootdowns fade out of the picture.

You still have cores to isolate, busybody kernel threads to suppress, and hardware interrupts to direct elsewhere, but TLB shootdown paranoia is largely a product of the current fashion favoring multi-threading over running separate processes with carefully chosen sharing.


Recently I found out ebpf jitting will cause interprocessor interrupts even with all the isolation configs available today. There's probably many more places where such interrupts could be generated and not handled by the current isolation mechanism. Not saying a unikernel is directly a solution but CPU isolation isn't perfect either and could use something from a unikernel.


Obligatory plug for Elias Naur's unik, a unikernel written almost entirely in Go, which can link with and run unmodified Go binaries: https://git.sr.ht/~eliasnaur/unik

Demo with virtio-gpu support: https://twitter.com/eliasnaur/status/1249765031299952646


So apparently you need cookies to show text.

"We use cookies to ensure that we give you the best experience on our website.

It seems your browser doesn't support them and this affects the site functionality."


If you're too lazy to read the whole paper I've made a quick explainer video:

https://youtu.be/3NWUgBsEXiU


Wonder how this differs from exokernals: https://pdos.csail.mit.edu/archive/exo/


Exokernels are orthogonal to unikernels.

Unikernels run exactly one application, but can be based on an arbitrarily complicated OS (specialised) to that one application. Exokernels can run arbitrarily many applications, but OS functionality is minimal and limited to ensuring protection and multiplexing of resources.

Specialising a rich OS (such as Linux) to a single application might yield OS functions that are as restricted as those you find in an exokernel.


Full disclosure: I have only briefly fiddled with unikernels. And in all honesty, I think there could be a good real life application for them. But the biggest problems I see is that even though they have been around for a while, building them is still incredibly complex and time consuming and there isn't a core community behind them and looking at it, it seems like they are starting to suffer from the javascript syndrome - hundreds of single maintainer or micro-communities doing their own thing and putting a sticker("unikernel" in this case) on top. As I said I haven't paid a whole lot of attention to them and I'm hoping someone is addressing these issues.


Isn’t that’s what this whole thread about? Making it easier by providing Linux as a compilation target.


Well one of the problems here is that it's kernel work and not everyone has the skillset to do that work, worst no one wants to fund it.

However, the tooling is changing.


this is excellent, where is the code, do you have examples?


https://github.com/unikernelLinux

We have memcached compiled for UKL but for some reason Ali has made that repo private (it's under the same namespace as above). I will ask him if he can make the other repos public this week.


[flagged]


A simple disk format, the unikernel.


three objections: tooling, tooling, and tooling.


Full article text:

http://sci-hub.tw/10.1145/3317550.3321445

Maybe link the top-level post there?


There is a link to the PDF on the page. No ACM membership required.

https://dl.acm.org/doi/pdf/10.1145/3317550.3321445


The link was not obvious to me. What is supposed to be wrong with using Sci-hub?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: