Kata Containers: The speed of containers, the security of VMs

TekMol · on Oct 4, 2023

What I don't like about Docker is that it spews stuff all over the place.

After installation, it is constantly running. As can be seen by:

    ps aux | grep docker

And it occupies IPs:

    ip addr | grep

I was on a train in Germany recently and could not use the Wifi because of that. Turned out the docker daemon occupied the IP range the Train Wifi uses. While I was not using docker at all.

I guess it clutters even more stuff. Any suggestions what else to look for?

So I am looking for a cleaner container solution. One that feels more like a Linux tool that keeps the system intact and only runs when it runs.

If Kata is such a tool, I would look into it more closely.

indigo945 · on Oct 4, 2023

Podman is probably close to what you want. It runs "daemonless" - while no containers are running, podman doesn't have a running process either. Also, as long as the containers are run in rootless mode, podman creates no virtual network interfaces. Rootless Podman makes use of network namespaces instead to separate the processes in the container from the host network. Processes on the host cannot see Podman's network namespaces and therefore are completely unaffected by any IP configuration therein.

alias_neo · on Oct 4, 2023

This;

Podman is what Docker should have been, for me. Security first, no daemon, more Linux-like behavior (You can manage them with SystemD unit files if you wish) and it supports the same, usual container images you build/use with Docker.

The main part it was lacking is the compose equivalent, but that too is coming along.

timost · on Oct 4, 2023

You can use docker-compose with podman using podman's docker compatibility API.

See my other comment in a recent thread [0]

[0] : https://news.ycombinator.com/item?id=37661056

Rapzid · on Oct 4, 2023

Is it coming along? Last I saw they were moving away from the compose Schema to a k8s manifest and... Those are absolutely disgusting.

nine_k · on Oct 4, 2023

(It's fun to note how systemd was an epitome of in-Linux-like software 7-8 years ago, and now it's the opposite. I'm not talking about systemd merits here, just about the change in perception.)

xorcist · on Oct 4, 2023

Comparably little harsh criticism has been made for systemd the init system, most has concerned systemd-the-almost-ntp-client and systemd-the-binary-logfile, the various related xml documents, things like that.

XorNot · on Oct 4, 2023

Honestly with quadlet it might be there on the compose front: being able to deploy either as systemd-like files or as Kubernetes manifests probably solves the entire problem in a very nice way (the K8S compatibility is the real magic IMO since it's the defacto cloud ecosystem).

alias_neo · on Oct 4, 2023

I haven't played with any of Podman's Kubernetes YAML stuff yet, but we target Kubernetes.

Does it support higher-level declarations like Deployments and StatefulSets? I'm trying to understand how/if we could use this without having to write new manifests. A (very) quick search didn't clarify it for me.

goku12 · on Oct 4, 2023

Quadlets just create a systemd unit file to launch containers with podman and have systemd manage its lifetime. Since systemd lacks the ability of controllers like Deployment and StatefulSets, I doubt that quadlets are able to achieve much more.

eliaspro · on Oct 4, 2023

I'm pretty sure, that cleverly combining various unit types and their capabilities/attributes would allow to cover 90% of what's needed to emulate Deployments and StatefulSets.

goku12 · on Oct 4, 2023

Yes. That's what I assume too.

Fire-Dragon-DoL · on Oct 8, 2023

From my understanding, it has serious limitations, for example since it's rootless, it can't bind on port 443

irusensei · on Oct 4, 2023

I find Podman frustrating at times because of how strict is with ACLs, UID maps and its integrations with SELINUX. I've had a chown/chmod -R of my own home folder (for migration purposes) cripple all podman (AND toolbox) related stuff for my user.

All in all its a great program and IMO even better than docker but It would be great if people try to make it sound like its a 1:1 comparison to Docker because it has its trade offs.

pjmlp · on Oct 4, 2023

Security is a pain, however cleaning up after the fact is even worse.

amluto · on Oct 4, 2023

I would argue that the selinux and uid issues with podman are selinux and posix DAC’s fault.

Selinux is neat but is IMO conceptually wrong in a container world. UIDs and are barely better. At most, these mechanisms should confirm that the container as a whole has a given permission, and that’s it.

(Seriously, the major clouds have deprecated per-object permissions on their object stores. IMO they are right to have done so.)

nicce · on Oct 4, 2023

It might feel frustrating, but there is a great reason. It gives you a more secure environment. When Docker just "works", you are unaware of the fact that you have given more permissions than app might actually need, increasing the attack vector.

Podman's use of SELinux prevents and mitigates many of these issues what you will have if somone targets your app.

pvorb · on Oct 4, 2023

To be honest I never even tried out Podman, even though I keep reading positive things about it. But the entire Docker ecosystem is complex enough on its own, so I fear that exchanging the container runtime entirely will cause a lot of problems that I don't know about today.

goku12 · on Oct 4, 2023

Podman has docker-compatible CLI, Dockerfiles and even docker-compose compatibility. It won't be as painful as you worry. So just give it a try. And besides the advantages already mentioned (rootless, no-daemon and isolated network namespaces), podman has some additional nifty features like pods (like in K8s), quadlets (containers managed by systemd), compatibility with k8s manifests, etc.

notnmeyer · on Oct 4, 2023

undeniable that you'll have a thing or two pop up—whether or not it's worth it depends on how much you want to understand. despite differences in how it works, podman _is_ simpler.

the reality though is that you can install both docker and podman, and just start/stop docker as necessary. its easy enough to experiment with podman on a system with docker installed.

imo it's similar to folks that think learning a new shell is a huge undertaking. the reality is you just install the new thing and drop in and out of it as you get comfortable. if it sticks, cool, if not, also cool.

raesene9 · on Oct 4, 2023

you can also use docker rootless if you want that style of networking. Indeed AFAIK docker and podman rootless both use the same networking approach (slirp4netns)

nezirus · on Oct 4, 2023

You can use pasta mode for rootless https://docs.podman.io/en/latest/markdown/podman-run.1.html#...

spandextwins · on Oct 4, 2023

Isn’t podman just IBMs version of docker?

lijok · on Oct 4, 2023

I find this take so interesting. Here's a tool so faultless your biggest gripe with it is some obscure interaction 99.99% of people wont even be aware of, and you want to replace the tool because of it.

This would be the equivalent of me buying a great car and disliking the fact that when you pull the carpets up there's a hard to reach clip that needs to be undone, so I decide to sell it and get a different car.

I see this happen with all great tools. It irons out all the important kinks, and people still find some obscure reason to fault the tool enough to switch.

(To be clear, I am aware Docker has issues beyond what GP is dealing with)

(Apologies for the strong tone of the comment, it's not intended, but could not find a better way to word it)

NhanH · on Oct 4, 2023

That is not an obscure issue. The common manifestation is breaking iptables rule (with expose port) and messing up software firewall has caused a lot of wasted hours and security issues.

insanitybit · on Oct 4, 2023

And yet the example they chose was "It broke wifi on a train's shitty network"

codetrotter · on Oct 4, 2023

Docker uses 172.17.0.0/16 subnet range by default.

There is nothing shitty about the network on the train for also using this same IP address range.

https://en.wikipedia.org/wiki/Private_network

10.0.0.0/8

172.16.0.0/12

192.168.0.0/16

You might be used to seeing addresses from 192.168.0.0/24 and 192.168.1.0/24 in home networks, and addresses from 10.x.y.0/24 in corporate internal networks.

But all of 172.16.0.0/12 has exactly the same kind of purpose as do 10.0.0.0/8 and 192.168.0.0/16.

The people that set up the network on the train did nothing wrong for using a subnet of 172.16.0.0/12.

justin_oaks · on Oct 4, 2023

Does anyone configure other IP ranges in Docker? I know there are other reserved IP ranges you might get away with. There's the CGNAT IP range 100.64.0.0/10, and there's the link-local IP range 169.254.0.0/16. These are unused in most situations and may work fine for Docker networks.

doubled112 · on Oct 4, 2023

I do. I have a few VLANs and subnets at home.

Docker Compose creates a new network for every project, and eventually overlaps with something important. They are fairly large ranges by default, so you end up taking up a lot of address space fast if you're not careful. This is especially wasteful because some of the Docker networks only contain two hosts, but are (from memory) a /24 or maybe even a /20.

Easier to handle it manually.

adammarples · on Oct 4, 2023

Surely the exact equivalent issue would be that the car is always running and sometimes blocks your WiFi when it's in the garage

lijok · on Oct 4, 2023

Very different implication there, so I would say it's not equivalent

salzig · on Oct 4, 2023

Even VM will need an IP-address-space shared with the host so you can access them.

That’s not a docker/container problem.

Btw, you‘re able to configure the Ip-space to use by passing an argument to the daemon or change it in daemon.json.

indigo945 · on Oct 4, 2023

Most VM hypervisors allow the creation of virtual networks that are only visible to the VMs on the network, not to the host. The connection of the host to the virtual network is an extra step involving a TAP driver that translates between the host OS and the hypervisor process. Obviously, the virtual networks themselves, when the TAP driver is not loaded, are not going to mess with the host's routing configuration like a (normal, rootful) docker deployment would, by virtue of being invisible to the host's networking stack.

Also, yes, it is a docker problem in particular. Linux has a solution for virtualizing networking for a subset of processes only: network namespaces. Docker doesn't use them by default, but can be taught to do so with the rootless kit. All rootless container engines use them by default.

guerby · on Oct 4, 2023

ssh user@fe80::xxx%tapvm

as long as ipv6 link local are not disabled the fe80 addresses work nicely :)

usr1106 · on Oct 4, 2023

The root cause is the ridiculously small private address space in IPv4. Conflicts have a non-negligible probability. You might even want to use a container on the train, so the daemon is not the root cause of problems in this case.

Using IPv6 would have reduced the probability a lot. But the excuse for the last 20 years has been, why bother with learning something new as long as "it works for me"... (I don't claim I would do differently.)

belthesar · on Oct 4, 2023

I won't address the IPv6 landmine here, because that is its own can of worms, and there are plenty of real, legitimate criticisms for not adopting it.

That aside, while I do agree that small private IPv4 space availability is a real concern, I'd also argue that Docker choosing to make the default network size a /16 compounds this problem significantly. I've never had a workflow where a Docker network needed more than a /24, and most could get away with a /26 or /27 without it being considered an aggressive limitation of IP space. Assigning the default Docker network size to something much more reasonable for a development context would do wonders for limiting collisions like this in the first place.

mort96 · on Oct 4, 2023

I mean the "excuse" as you call it is more, "why bother with learning something which doesn't work when I could use the thing which works, even if it's imperfect". I'd bet the train wifi doesn't support IPv6.

mccoyc · on Oct 5, 2023

I agree. It absolutely is IPv4. Hosts could use DHCPv6 with prefix delegation (DHCPv6-PD), and use that delegated /64 for its internal Docker bridge and get rid of NAT. And yes, you can still have your Netfilters stateful packet filtering in place! People are going through so much pain because they won't embrace the tools IPv6 gives you! This solution is 20 years old. God help you if your network is actually using 172.17.0.0/16. grumble grumble get off my lawn, kids.

cabirum · on Oct 4, 2023

Disable the docker.service unit but don't disable docker.sock. Docker won't be running at startup but will transparently start on first invokation.

Configure daemon address pools in case they conflict/intersect with real network addresses.

loloquwowndueo · on Oct 4, 2023

Systemd-nspawn or LXD/Incus are also alternatives. LXD user experience is pretty good.

bongobingo1 · on Oct 4, 2023

I feel like LXD shot them selves in the foot with terminology. When ever I got interested I would get confused between LXC and LXD and lxc the command (part of LXD) and lxd the command.

Now its even muddier with Canonical ... taking back (?) LXD.

Throw in the alternative vision (LX* containers are more like persistent VMs than ephemeral containers) and a lack of `<container-engine> pull app` and all that entails re: DevX and DeployX, it always felt like a mountain to get going.

goku12 · on Oct 4, 2023

The team behind Incus (the fork of LXD) is working on that problem. They have decided on unambiguous names and have even rearranged some commands to make it consistent and clear.

rascul · on Oct 4, 2023

> `<container-engine> pull app`

Is quite doable with lxc. An application container can be built and shared. It doesn't appear to be typically done, though.

leosanchez · on Oct 4, 2023

I liked LXD from the limited time I spent on it.

Do you know any resources to learn it a bit more in depth ?

Grimburger · on Oct 4, 2023

Reading this makes me concerned as someone who loves LXD, because when you reflect on it there's not really that much for newcomers besides some blog posts or a "fail and ask on the forums" model which clearly isn't optimal for some learning styles. It's a probably a misstep by the linux containers ecosystem, I do hope Incus can improve on this.

goku12 · on Oct 4, 2023

LXD official online documentation (now hosted by Canonical) is the sole resource I needed to set it up and use it. The CLI concepts and commands are designed well enough and is intuitive. The only outside resource I had to read was for some privileged containers running K8s inside.

pierrelf · on Oct 4, 2023

Have you tried podman? I've been quite happy using it on my fedora laptop, seems less finicky than Docker and their compose plugin has gotten much closer to docker-compose

usr1106 · on Oct 4, 2023

podman is pretty good. However, it is more secure than docker by default, so with more complicated images benefiting from docker's "containers do not contain" properties things tend to break.

fiddlerwoaroof · on Oct 4, 2023

It’s a steep learning curve, but I find nix basically solves all the parts of these problems for me that I want solved (without things like network namespaces that just make life harder) and it’s easy to turn a working nix build into a container for deployment.

mongol · on Oct 4, 2023

I don't think Nix solves security concerns as well as other distributions do. It excels in most ways but this is the weakest point.

k8svet · on Oct 4, 2023

Nix isn't an OS and NixOS does "security concerns" just as well as any other distro. I'm really struggling to understand what you mean. Nix itself doesn't have any runtime sandboxing, but that's no different than using distro packages on any other distro. Meanwhile you can still use Flatpaks, docker, podman, kvm/libvirt, etc, just like on any distro.

endgame · on Oct 4, 2023

As great as Nix is, a security update to something like glibc will necessitate a rebuild of the entire universe as a necessary consequence of its "immutable, hashed inputs" model. Guix has a feature called "grafts" that avoids some of this pain, but compromises the purity of the functional packaging model to do so.

nbpname · on Oct 5, 2023

Grafting is indeed a good solution for fast security updates, except that the way this is implemented in Guix depends on the maintainer of each package. This is indeed better than Nix which relies on asking every user to replace the dependencies.

A few years ago, I made a proposal to have some automatic grafting mechanism: https://github.com/NixOS/nixpkgs/pull/10851

This would automagically work by simply maintaining 2 trees of Nixpkgs, one with the cherry-picked security updates, and one which matches the latest set of cached packages. This way one can fully benefit from the cached packages while having the ability to replaces with the latest security patches they want to import without building the world.

Unfortunately, rewritting Nixpkgs to fit the requirements needed to have the automagic mechanism is a huge project, especially given the activity of Nixpkgs. Maintaining a fork of Nixpkgs which stays up-to-date while changing its inner working cannot be held by a single person.

My hopes would be to push this to the Nixpkgs Architecture Team, while preventing them from doing mistakes by inserting extra complexity while making this work more challenging.

SkyMarshal · on Oct 4, 2023

But Nix rebuilds are cheap and fast, what’s the problem?

ochoseis · on Oct 4, 2023

Plus they can be easily cached so you only ever have to do it once, and the result is available to your whole fleet.

mongol · on Oct 4, 2023

No it doesn't. It has nothing like SELinux or AppArmor

fiddlerwoaroof · on Oct 4, 2023

So, first of all, nix isn't a distribution, it's a build/packaging tool. Secondly, docker/containers are not a security mechanism

pxc · on Oct 5, 2023

Nixpkgs is a distribution, as is NixOS. Neither of those things is Nix, but it's an easy mistake to mentally correct when interpreting someone charitably. That said, GP didn't give you much to go off of.

As for security, it's worth noting here that there are Nix-native tools for generating MicroVMs as well, if that's what folks are after with Kata and Firecracker.

fiddlerwoaroof · on Oct 6, 2023

Nixpkgs isn’t really a distribution, but something relatively new (although a bit like FreeBSD ports and Gentoo’s portage): it’s a collection of build scripts designed to make it possible to build software cross-platform in a reproducible way.

pxc · on Oct 6, 2023

Portage and *BSD ports systems are certainly software distributions. They're just source-based distributions.

Nixpkgs includes the same kind of reuse and integration and patching that you see in other kinds of software distributions, like Linux distros or Conda.

fiddlerwoaroof · on Oct 7, 2023

In this context I assumed “distribution” meant “operating system distribution”

pxc · on Oct 9, 2023

Ah! Got you.

m90 · on Oct 4, 2023

The ICE Wifi is using very exotic IPs for their gateway here, so this is not really Docker's fault.

For reference, this can be worked around by deleting your Docker networks, logging in and recreating the networks, which should pose no problems on a dev machine.

usr1106 · on Oct 4, 2023

What are "exotic" IPs? By using "common" ones, would the probability of a conflict even grow?

I am not aware of any guidance how to use privat IPv4 addresses. In practice 192.168.1.0/24 seems to be the most commonly used one, so you might want to avoid that.

pseg134 · on Oct 4, 2023

Why would docker even need to be running if no containers are running? It is 100% dockers fault, it is a bad design choice.

usr1106 · on Oct 4, 2023

That's an irrelevant argument. Why should the grandparent not be able to use a container on the train?

goku12 · on Oct 4, 2023

It's not that irrelevant, considering that podman managed to solve both problems - necessity of a daemon and keeping the default network namespace clean. That said, I don't want to take away the credit of Docker being the pioneers in their field (yes it existed before. But it wasn't this popular).

insanitybit · on Oct 4, 2023

Isn't Podman only able to do this because of user namespaces, which are a very recent addition to Linux? I wonder how Podman will do, if that's the case, now that user namespaces are being turned off by default due to their security implications.

goku12 · on Oct 4, 2023

I always thought containerization - including docker - was the result of Linux namespaces (more so than even cgroups). Checking again, Linux namespaces were introduced in 2002. Docker was released more than a decade later - in 2013. I believe that Docker always used namespaces - that's how they achieved process isolation. But they didn't use it to its full potential initially - including network namespaces and pods.

insanitybit · on Oct 4, 2023

Unprivileged user namespaces are much newer.

goku12 · on Oct 5, 2023

Thanks for mentioning that.

throwawaaarrgh · on Oct 4, 2023

> So I am looking for a cleaner container solution

You need to understand what containers are first. Containers are not one thing. They are an amalgam of different OS primitives designed to give you the maximum flexibility, control, and isolation for an application environment.

When you say Docker is "cluttering" your system with processes, you mean the daemon that is used to start and manage containers on your system. There are alternative container systems that don't use a daemon, and can run rootless, but they have some tradeoffs. They are also not nearly as portable or easy to use as Docker, as a whole.

Yes, it "occupies" IP space, by default. You can disable or reconfigure the networking aspect of container solutions, to either use a different subnet, or just use the host's native networking. But then you won't get network isolation for your containerized app, and you will probably complain that you can only run one process on a given port at a time, and without a firewall, people on the train will be attacking your containerized apps.

> So I am looking for a cleaner container solution.

There isn't such a thing as clean software. People like to generalize like this, but what it usually means when they say "clean" is "I want it to be magic, as simple as possible, do everything I could ever want, and to not have to think about it". Which is wanting to have your cake and eat it too. Either it does everything for you and it's complex, or you have to get your hands a little dirty and it's simple.

> One that feels more like a Linux tool that keeps the system intact

Point in case: you want it to maintain the system for you.. Docker does that. The end result is what you call "clutter".

> and only runs when it runs.

You want a rootless daemonless container frontend, like Podman. Good luck getting it to work... Don't @ me when you find out it's a lot of extra effort that doesn't give you anything better than Docker did.

Kata containers is for service providers. Nobody really needs that level of isolation on their laptops.

smarkov · on Oct 4, 2023

> and without a firewall, people on the train will be attacking your containerized apps.

I was surprised when I learned this but Docker by default bypasses UFW and potentially other firewalls relying on iptables.

https://blog.viktorpetersson.com/2014/11/03/the-dangers-of-u...

webstrand · on Oct 4, 2023

I ran into the same issue. I've mapped docker to the 0.0.0.0/8 subnet which nobody uses. And before that I was using the 169.254.0.0/16 subnet which I've never actually needed for its real purpose.

olddustytrail · on Oct 4, 2023

It doesn't "spew stuff all over the place". Docker runs as a service. It's like complaining that after you install nginx it's constantly running. And it occupies port 80!

Yeah, that's the service you installed. Stop the service if you don't want it running.

wkat4242 · on Oct 4, 2023

If you're not bound to Linux, FreeBSD's jails are excellent.

nunez · on Oct 4, 2023

dockerd should be the only process running if no other containers are running.

You can also have containers use host-mode networking (share's the host's NICs instead of using bridged networking via virtual NICs) if local IP address pollution is a concern.

Kata is just a container runtime. Depending on how they implement their network and storage drivers, functionality should be mostly the same.

unmole · on Oct 4, 2023

Depending on what you need, either podman or systemd-nspawn.

spandextwins · on Oct 4, 2023

That would be a pickac error.

bittermandel · on Oct 4, 2023

We use Kata Containers to create Firecracker VMs from Kubernetes. Works really well for us. Though I am hoping there will be a more specific solution for Firecracker, as we don't need any other runtimes (which kind of ruins the purpose of Kata).

boyd · on Oct 4, 2023

I was looking at this a while back and it was difficult/tedious at the time (and I think also not supported on Amazon EKS).

Has it gotten better? Any resources you recommend? Cursory Google searches on this have so much outdated info it can be hard to quickly wade back in.

raesene9 · on Oct 4, 2023

Kata containers are a cool concept but can be a bit difficult to get started with.

Last time I tried it the standalone Docker/containerd integration wasn't working well, the project seemed to be more targeting deployment as part of a k8s cluster.

usr1106 · on Oct 4, 2023

kata containers were demonstrated to me in 2018. So they are not that new. I had forgotten about them in the meantime.

Is there a good reason why they don't seem widely adopted?

insanitybit · on Oct 4, 2023

> Is there a good reason why they don't seem widely adopted?

TBH their docs aren't that great. There should probably be a 'curl | sh' solution to install it at the top of the readme followed by a '<run this command and you're in an ubuntu shell in kata!>' command right after.

Another issue is the lack of nested virtualization in EC2 instances that aren't the very expensive i3 metals. That turns this from a "it's a drop in replacement" to "I'm spending thousands of dollars on this".

rwmj · on Oct 4, 2023

The proposed solution to the nested virtualization problem (apart from somehow persuading Amazon to switch that on) is something called peer pods, where the containers run in separate AWS instances. Arranging the traffic between the peer pods and the instance which is acting as host is quite challenging and I've never seen this successfully deployed in production.

https://www.redhat.com/en/blog/red-hat-openshift-sandboxed-c...

PurelyApplied · on Oct 4, 2023

There should be a `curl | sh` in this, a security-oriented product? To demonstrate how badly you need it, perhaps?

Please don't pipe the Internet directly into your command line.

HL33tibCe7 · on Oct 4, 2023

Ah yes, because using a package manager to pipe the internet directly into our computer instead is so much better.

nicce · on Oct 4, 2023

Yes. Proper package manager usually proceeds to install only signed packages. It means that usually OS maintainer has verified the purpose of the package.

It gives a quite lot more trust than running arbitrary content as shell script, without any third party verification.

misnome · on Oct 4, 2023

Until you want to use software that it doesn't have, and then you are adding third-party repositories.

Which is still just code from the internet.

HL33tibCe7 · on Oct 4, 2023

Verified using a public key that you acquire over HTTPS, the exact same channel that you trust with curl | sh

nicce · on Oct 4, 2023

From above:

> , without any third party verification.

Certificate provider does not verify packages nor anything what is coming from there. Server even might be just proxy.

HL33tibCe7 · on Oct 5, 2023

The exact same problem exists with the channel that you acquire the public key you trust from. You’re still fundamentally trusting HTTPS to the package provider - you’re just trusting it at a different point.

nicce · on Oct 5, 2023

Usually keyring is separate package which is also signed with a key which can be verified from multiple different sources.

Of course, if you are a target of nation state attack, which fakes public keys from all sources by MITMn DNSs and servers, you might end up with the wrong package.

But that threat model is totally different.

insanitybit · on Oct 5, 2023

I feel like "curl | sh is fine" has been explained so many times at this point idk how people still aren't on the same page. If you hate "curl | sh" so much I'm sure they can provide some other method of installation.

supriyo-biswas · on Oct 4, 2023

Amazon’s Firecracker, another project with similar goals, has seen a lot of adoption however.

goku12 · on Oct 4, 2023

To add, firecracker is an alternative to qemu like katacontainers are an alternative to containerd or runc. But both focus of security by isolation, as you mentioned.

rwmj · on Oct 4, 2023

The ideal situation is you would never know you are using them, they'd just work as an extra security layer. I think they're quite a ways off doing this.

pjmlp · on Oct 4, 2023

Azure has a similar approach available as option.

galangalalgol · on Oct 4, 2023

Is this related to kata os at all?

cpach · on Oct 4, 2023

shantnutiwari · on Oct 5, 2023

>Kata Containers is an open source project and community working to build a standard implementation of lightweight Virtual Machines (VMs) that feel and perform like containers, but provide the workload isolation and security advantages of VMs.

Ummm, so what do they do again? Sounds like marketing speak. "Like VMs but containers" doesnt tell me anything

tambourine_man · on Oct 4, 2023

How are they accomplishing this? I glanced at the documentation but couldn’t find anything in this regard.

rwmj · on Oct 4, 2023

By running the containers in VMs, and attempting to make the VMs boot very quickly by carefully tuning qemu (or Firecracker) and the guest kernel. The main problem with this approach is surprisingly not the time overhead - Kubernetes is fscking slow at scheduling regular containers - but the memory overhead, since you need to allocate sufficient memory up front for the largest possible memory usage of the container. Most containers expect to get more memory from the system simply by doing sbrk/mmap, and VMs simply don't work this way.

suprjami · on Oct 4, 2023

Doesn't KVM work that way? You can give a VM "up to" a certain amount of memory but it doesn't allocate all that memory at first boot. iiuc that's because each VM is just a process which can increase its own heap.

It's explained in more detail here: https://serverfault.com/questions/773581/virt-manager-partio...

rwmj · on Oct 5, 2023

Not really, no. The kernel has a certain overhead tracking pages and page tables, so adding extra memory and waiting for it to be swapped in isn't free. Plus you still have to account for the memory being used somehow. If you use cgroups - as is done most commonly - that will track the full memory allocated. Not to mention this would only solve half of the problem, you also have to think about what happens with munmap.

suprjami · on Oct 5, 2023

Good points, thank you for the explanation!

kosolam · on Oct 4, 2023

What is the isolation mechanism in kata containers that make them on par with VMs?

vladvasiliu · on Oct 4, 2023

A hypervisor, such as KVM. See https://katacontainers.io/learn/

kosolam · on Oct 5, 2023

Frankly it’s a bit confusing. Some wrapper around qemu and firecracker. This is what I gathered from this link after 5 minutes of skimming.

ramon156 · on Oct 4, 2023

Somewhat that i'd love to see someone develop is mutagen support. Running docker on mac isn't very optimised, let alone enjoyable because of the amount of RW that's being done. Mutagen solves this nicely

andrewstuart · on Oct 4, 2023

Qemu also has its microvm

mezobeli · on Oct 4, 2023

LXC does it better, you can even have virtualization

goku12 · on Oct 4, 2023

The difference here is that katacontainers are OCI and CRI complaint - meaning that it can immediately be used with K8s, Nomad and possibly others. You get all the features of these orchestration platforms. LXC doesn't have that (it actually predates OCI and CRI).

There are other orchestration systems that can use LXC - LXD, libvirt, Proxmox, and may be others. Also, LXC doesn't have traditional virtualization - that's a feature of LXD using KVM. (Do you mean system containers, as opposed to regular app containers?)

rascul · on Oct 4, 2023

LXC ships with a template that can import OCI containers.

lacoolj · on Oct 4, 2023

I looked into this months ago and found it wasn't very active. Has this changed?

two_handfuls · on Oct 5, 2023

How is Kata safer than Docker?