What I don't like about Docker is that it spews stuff all over the place.
After installation, it is constantly running. As can be seen by:
ps aux | grep docker
And it occupies IPs:
ip addr | grep
I was on a train in Germany recently and could not use the Wifi because of that. Turned out the docker daemon occupied the IP range the Train Wifi uses. While I was not using docker at all.
I guess it clutters even more stuff. Any suggestions what else to look for?
So I am looking for a cleaner container solution. One that feels more like a Linux tool that keeps the system intact and only runs when it runs.
If Kata is such a tool, I would look into it more closely.
Podman is probably close to what you want. It runs "daemonless" - while no containers are running, podman doesn't have a running process either. Also, as long as the containers are run in rootless mode, podman creates no virtual network interfaces. Rootless Podman makes use of network namespaces instead to separate the processes in the container from the host network. Processes on the host cannot see Podman's network namespaces and therefore are completely unaffected by any IP configuration therein.
Podman is what Docker should have been, for me. Security first, no daemon, more Linux-like behavior (You can manage them with SystemD unit files if you wish) and it supports the same, usual container images you build/use with Docker.
The main part it was lacking is the compose equivalent, but that too is coming along.
(It's fun to note how systemd was an epitome of in-Linux-like software 7-8 years ago, and now it's the opposite. I'm not talking about systemd merits here, just about the change in perception.)
Comparably little harsh criticism has been made for systemd the init system, most has concerned systemd-the-almost-ntp-client and systemd-the-binary-logfile, the various related xml documents, things like that.
Honestly with quadlet it might be there on the compose front: being able to deploy either as systemd-like files or as Kubernetes manifests probably solves the entire problem in a very nice way (the K8S compatibility is the real magic IMO since it's the defacto cloud ecosystem).
I haven't played with any of Podman's Kubernetes YAML stuff yet, but we target Kubernetes.
Does it support higher-level declarations like Deployments and StatefulSets? I'm trying to understand how/if we could use this without having to write new manifests. A (very) quick search didn't clarify it for me.
Quadlets just create a systemd unit file to launch containers with podman and have systemd manage its lifetime. Since systemd lacks the ability of controllers like Deployment and StatefulSets, I doubt that quadlets are able to achieve much more.
I'm pretty sure, that cleverly combining various unit types and their capabilities/attributes would allow to cover 90% of what's needed to emulate Deployments and StatefulSets.
I find Podman frustrating at times because of how strict is with ACLs, UID maps and its integrations with SELINUX. I've had a chown/chmod -R of my own home folder (for migration purposes) cripple all podman (AND toolbox) related stuff for my user.
All in all its a great program and IMO even better than docker but It would be great if people try to make it sound like its a 1:1 comparison to Docker because it has its trade offs.
I would argue that the selinux and uid issues with podman are selinux and posix DAC’s fault.
Selinux is neat but is IMO conceptually wrong in a container world. UIDs and are barely better. At most, these mechanisms should confirm that the container as a whole has a given permission, and that’s it.
(Seriously, the major clouds have deprecated per-object permissions on their object stores. IMO they are right to have done so.)
It might feel frustrating, but there is a great reason.
It gives you a more secure environment.
When Docker just "works", you are unaware of the fact that you have given more permissions than app might actually need, increasing the attack vector.
Podman's use of SELinux prevents and mitigates many of these issues what you will have if somone targets your app.
To be honest I never even tried out Podman, even though I keep reading positive things about it. But the entire Docker ecosystem is complex enough on its own, so I fear that exchanging the container runtime entirely will cause a lot of problems that I don't know about today.
Podman has docker-compatible CLI, Dockerfiles and even docker-compose compatibility. It won't be as painful as you worry. So just give it a try. And besides the advantages already mentioned (rootless, no-daemon and isolated network namespaces), podman has some additional nifty features like pods (like in K8s), quadlets (containers managed by systemd), compatibility with k8s manifests, etc.
undeniable that you'll have a thing or two pop up—whether or not it's worth it depends on how much you want to understand. despite differences in how it works, podman _is_ simpler.
the reality though is that you can install both docker and podman, and just start/stop docker as necessary. its easy enough to experiment with podman on a system with docker installed.
imo it's similar to folks that think learning a new shell is a huge undertaking. the reality is you just install the new thing and drop in and out of it as you get comfortable. if it sticks, cool, if not, also cool.
you can also use docker rootless if you want that style of networking. Indeed AFAIK docker and podman rootless both use the same networking approach (slirp4netns)
I find this take so interesting. Here's a tool so faultless your biggest gripe with it is some obscure interaction 99.99% of people wont even be aware of, and you want to replace the tool because of it.
This would be the equivalent of me buying a great car and disliking the fact that when you pull the carpets up there's a hard to reach clip that needs to be undone, so I decide to sell it and get a different car.
I see this happen with all great tools. It irons out all the important kinks, and people still find some obscure reason to fault the tool enough to switch.
(To be clear, I am aware Docker has issues beyond what GP is dealing with)
(Apologies for the strong tone of the comment, it's not intended, but could not find a better way to word it)
That is not an obscure issue. The common manifestation is breaking iptables rule (with expose port) and messing up software firewall has caused a lot of wasted hours and security issues.
You might be used to seeing addresses from 192.168.0.0/24 and 192.168.1.0/24 in home networks, and addresses from 10.x.y.0/24 in corporate internal networks.
But all of 172.16.0.0/12 has exactly the same kind of purpose as do 10.0.0.0/8 and 192.168.0.0/16.
The people that set up the network on the train did nothing wrong for using a subnet of 172.16.0.0/12.
Does anyone configure other IP ranges in Docker? I know there are other reserved IP ranges you might get away with. There's the CGNAT IP range 100.64.0.0/10, and there's the link-local IP range 169.254.0.0/16. These are unused in most situations and may work fine for Docker networks.
Docker Compose creates a new network for every project, and eventually overlaps with something important. They are fairly large ranges by default, so you end up taking up a lot of address space fast if you're not careful. This is especially wasteful because some of the Docker networks only contain two hosts, but are (from memory) a /24 or maybe even a /20.
Most VM hypervisors allow the creation of virtual networks that are only visible to the VMs on the network, not to the host. The connection of the host to the virtual network is an extra step involving a TAP driver that translates between the host OS and the hypervisor process. Obviously, the virtual networks themselves, when the TAP driver is not loaded, are not going to mess with the host's routing configuration like a (normal, rootful) docker deployment would, by virtue of being invisible to the host's networking stack.
Also, yes, it is a docker problem in particular. Linux has a solution for virtualizing networking for a subset of processes only: network namespaces. Docker doesn't use them by default, but can be taught to do so with the rootless kit. All rootless container engines use them by default.
The root cause is the ridiculously small private address space in IPv4. Conflicts have a non-negligible probability. You might even want to use a container on the train, so the daemon is not the root cause of problems in this case.
Using IPv6 would have reduced the probability a lot. But the excuse for the last 20 years has been, why bother with learning something new as long as "it works for me"... (I don't claim I would do differently.)
I won't address the IPv6 landmine here, because that is its own can of worms, and there are plenty of real, legitimate criticisms for not adopting it.
That aside, while I do agree that small private IPv4 space availability is a real concern, I'd also argue that Docker choosing to make the default network size a /16 compounds this problem significantly. I've never had a workflow where a Docker network needed more than a /24, and most could get away with a /26 or /27 without it being considered an aggressive limitation of IP space. Assigning the default Docker network size to something much more reasonable for a development context would do wonders for limiting collisions like this in the first place.
I mean the "excuse" as you call it is more, "why bother with learning something which doesn't work when I could use the thing which works, even if it's imperfect". I'd bet the train wifi doesn't support IPv6.
I agree. It absolutely is IPv4. Hosts could use DHCPv6 with prefix delegation (DHCPv6-PD), and use that delegated /64 for its internal Docker bridge and get rid of NAT. And yes, you can still have your Netfilters stateful packet filtering in place! People are going through so much pain because they won't embrace the tools IPv6 gives you! This solution is 20 years old. God help you if your network is actually using 172.17.0.0/16. grumble grumble get off my lawn, kids.
I feel like LXD shot them selves in the foot with terminology. When ever I got interested I would get confused between LXC and LXD and lxc the command (part of LXD) and lxd the command.
Now its even muddier with Canonical ... taking back (?) LXD.
Throw in the alternative vision (LX* containers are more like persistent VMs than ephemeral containers) and a lack of `<container-engine> pull app` and all that entails re: DevX and DeployX, it always felt like a mountain to get going.
The team behind Incus (the fork of LXD) is working on that problem. They have decided on unambiguous names and have even rearranged some commands to make it consistent and clear.
Reading this makes me concerned as someone who loves LXD, because when you reflect on it there's not really that much for newcomers besides some blog posts or a "fail and ask on the forums" model which clearly isn't optimal for some learning styles. It's a probably a misstep by the linux containers ecosystem, I do hope Incus can improve on this.
LXD official online documentation (now hosted by Canonical) is the sole resource I needed to set it up and use it. The CLI concepts and commands are designed well enough and is intuitive. The only outside resource I had to read was for some privileged containers running K8s inside.
Have you tried podman? I've been quite happy using it on my fedora laptop, seems less finicky than Docker and their compose plugin has gotten much closer to docker-compose
podman is pretty good. However, it is more secure than docker by default, so with more complicated images benefiting from docker's "containers do not contain" properties things tend to break.
It’s a steep learning curve, but I find nix basically solves all the parts of these problems for me that I want solved (without things like network namespaces that just make life harder) and it’s easy to turn a working nix build into a container for deployment.
Nix isn't an OS and NixOS does "security concerns" just as well as any other distro. I'm really struggling to understand what you mean. Nix itself doesn't have any runtime sandboxing, but that's no different than using distro packages on any other distro. Meanwhile you can still use Flatpaks, docker, podman, kvm/libvirt, etc, just like on any distro.
As great as Nix is, a security update to something like glibc will necessitate a rebuild of the entire universe as a necessary consequence of its "immutable, hashed inputs" model. Guix has a feature called "grafts" that avoids some of this pain, but compromises the purity of the functional packaging model to do so.
Grafting is indeed a good solution for fast security updates, except that the way this is implemented in Guix depends on the maintainer of each package. This is indeed better than Nix which relies on asking every user to replace the dependencies.
This would automagically work by simply maintaining 2 trees of Nixpkgs, one with the cherry-picked security updates, and one which matches the latest set of cached packages. This way one can fully benefit from the cached packages while having the ability to replaces with the latest security patches they want to import without building the world.
Unfortunately, rewritting Nixpkgs to fit the requirements needed to have the automagic mechanism is a huge project, especially given the activity of Nixpkgs. Maintaining a fork of Nixpkgs which stays up-to-date while changing its inner working cannot be held by a single person.
My hopes would be to push this to the Nixpkgs Architecture Team, while preventing them from doing mistakes by inserting extra complexity while making this work more challenging.
Nixpkgs is a distribution, as is NixOS. Neither of those things is Nix, but it's an easy mistake to mentally correct when interpreting someone charitably. That said, GP didn't give you much to go off of.
As for security, it's worth noting here that there are Nix-native tools for generating MicroVMs as well, if that's what folks are after with Kata and Firecracker.
Nixpkgs isn’t really a distribution, but something relatively new (although a bit like FreeBSD ports and Gentoo’s portage): it’s a collection of build scripts designed to make it possible to build software cross-platform in a reproducible way.
Portage and *BSD ports systems are certainly software distributions. They're just source-based distributions.
Nixpkgs includes the same kind of reuse and integration and patching that you see in other kinds of software distributions, like Linux distros or Conda.
The ICE Wifi is using very exotic IPs for their gateway here, so this is not really Docker's fault.
For reference, this can be worked around by deleting your Docker networks, logging in and recreating the networks, which should pose no problems on a dev machine.
What are "exotic" IPs? By using "common" ones, would the probability of a conflict even grow?
I am not aware of any guidance how to use privat IPv4 addresses. In practice 192.168.1.0/24 seems to be the most commonly used one, so you might want to avoid that.
It's not that irrelevant, considering that podman managed to solve both problems - necessity of a daemon and keeping the default network namespace clean. That said, I don't want to take away the credit of Docker being the pioneers in their field (yes it existed before. But it wasn't this popular).
Isn't Podman only able to do this because of user namespaces, which are a very recent addition to Linux? I wonder how Podman will do, if that's the case, now that user namespaces are being turned off by default due to their security implications.
I always thought containerization - including docker - was the result of Linux namespaces (more so than even cgroups). Checking again, Linux namespaces were introduced in 2002. Docker was released more than a decade later - in 2013. I believe that Docker always used namespaces - that's how they achieved process isolation. But they didn't use it to its full potential initially - including network namespaces and pods.
> So I am looking for a cleaner container solution
You need to understand what containers are first. Containers are not one thing. They are an amalgam of different OS primitives designed to give you the maximum flexibility, control, and isolation for an application environment.
When you say Docker is "cluttering" your system with processes, you mean the daemon that is used to start and manage containers on your system. There are alternative container systems that don't use a daemon, and can run rootless, but they have some tradeoffs. They are also not nearly as portable or easy to use as Docker, as a whole.
Yes, it "occupies" IP space, by default. You can disable or reconfigure the networking aspect of container solutions, to either use a different subnet, or just use the host's native networking. But then you won't get network isolation for your containerized app, and you will probably complain that you can only run one process on a given port at a time, and without a firewall, people on the train will be attacking your containerized apps.
> So I am looking for a cleaner container solution.
There isn't such a thing as clean software. People like to generalize like this, but what it usually means when they say "clean" is "I want it to be magic, as simple as possible, do everything I could ever want, and to not have to think about it". Which is wanting to have your cake and eat it too. Either it does everything for you and it's complex, or you have to get your hands a little dirty and it's simple.
> One that feels more like a Linux tool that keeps the system intact
Point in case: you want it to maintain the system for you.. Docker does that. The end result is what you call "clutter".
> and only runs when it runs.
You want a rootless daemonless container frontend, like Podman. Good luck getting it to work... Don't @ me when you find out it's a lot of extra effort that doesn't give you anything better than Docker did.
Kata containers is for service providers. Nobody really needs that level of isolation on their laptops.
I ran into the same issue. I've mapped docker to the 0.0.0.0/8 subnet which nobody uses. And before that I was using the 169.254.0.0/16 subnet which I've never actually needed for its real purpose.
It doesn't "spew stuff all over the place". Docker runs as a service. It's like complaining that after you install nginx it's constantly running. And it occupies port 80!
Yeah, that's the service you installed. Stop the service if you don't want it running.
dockerd should be the only process running if no other containers are running.
You can also have containers use host-mode networking (share's the host's NICs instead of using bridged networking via virtual NICs) if local IP address pollution is a concern.
Kata is just a container runtime. Depending on how they implement their network and storage drivers, functionality should be mostly the same.
We use Kata Containers to create Firecracker VMs from Kubernetes. Works really well for us. Though I am hoping there will be a more specific solution for Firecracker, as we don't need any other runtimes (which kind of ruins the purpose of Kata).
Kata containers are a cool concept but can be a bit difficult to get started with.
Last time I tried it the standalone Docker/containerd integration wasn't working well, the project seemed to be more targeting deployment as part of a k8s cluster.
> Is there a good reason why they don't seem widely adopted?
TBH their docs aren't that great. There should probably be a 'curl | sh' solution to install it at the top of the readme followed by a '<run this command and you're in an ubuntu shell in kata!>' command right after.
Another issue is the lack of nested virtualization in EC2 instances that aren't the very expensive i3 metals. That turns this from a "it's a drop in replacement" to "I'm spending thousands of dollars on this".
The proposed solution to the nested virtualization problem (apart from somehow persuading Amazon to switch that on) is something called peer pods, where the containers run in separate AWS instances. Arranging the traffic between the peer pods and the instance which is acting as host is quite challenging and I've never seen this successfully deployed in production.
Yes. Proper package manager usually proceeds to install only signed packages.
It means that usually OS maintainer has verified the purpose of the package.
It gives a quite lot more trust than running arbitrary content as shell script, without any third party verification.
The exact same problem exists with the channel that you acquire the public key you trust from. You’re still fundamentally trusting HTTPS to the package provider - you’re just trusting it at a different point.
Usually keyring is separate package which is also signed with a key which can be verified from multiple different sources.
Of course, if you are a target of nation state attack, which fakes public keys from all sources by MITMn DNSs and servers, you might end up with the wrong package.
I feel like "curl | sh is fine" has been explained so many times at this point idk how people still aren't on the same page. If you hate "curl | sh" so much I'm sure they can provide some other method of installation.
To add, firecracker is an alternative to qemu like katacontainers are an alternative to containerd or runc. But both focus of security by isolation, as you mentioned.
The ideal situation is you would never know you are using them, they'd just work as an extra security layer. I think they're quite a ways off doing this.
>Kata Containers is an open source project and community working to build a standard implementation of lightweight Virtual Machines (VMs) that feel and perform like containers, but provide the workload isolation and security advantages of VMs.
Ummm, so what do they do again? Sounds like marketing speak. "Like VMs but containers" doesnt tell me anything
By running the containers in VMs, and attempting to make the VMs boot very quickly by carefully tuning qemu (or Firecracker) and the guest kernel. The main problem with this approach is surprisingly not the time overhead - Kubernetes is fscking slow at scheduling regular containers - but the memory overhead, since you need to allocate sufficient memory up front for the largest possible memory usage of the container. Most containers expect to get more memory from the system simply by doing sbrk/mmap, and VMs simply don't work this way.
Doesn't KVM work that way? You can give a VM "up to" a certain amount of memory but it doesn't allocate all that memory at first boot. iiuc that's because each VM is just a process which can increase its own heap.
Not really, no. The kernel has a certain overhead tracking pages and page tables, so adding extra memory and waiting for it to be swapped in isn't free. Plus you still have to account for the memory being used somehow. If you use cgroups - as is done most commonly - that will track the full memory allocated. Not to mention this would only solve half of the problem, you also have to think about what happens with munmap.
Somewhat that i'd love to see someone develop is mutagen support. Running docker on mac isn't very optimised, let alone enjoyable because of the amount of RW that's being done. Mutagen solves this nicely
The difference here is that katacontainers are OCI and CRI complaint - meaning that it can immediately be used with K8s, Nomad and possibly others. You get all the features of these orchestration platforms. LXC doesn't have that (it actually predates OCI and CRI).
There are other orchestration systems that can use LXC - LXD, libvirt, Proxmox, and may be others. Also, LXC doesn't have traditional virtualization - that's a feature of LXD using KVM. (Do you mean system containers, as opposed to regular app containers?)
After installation, it is constantly running. As can be seen by:
And it occupies IPs: I was on a train in Germany recently and could not use the Wifi because of that. Turned out the docker daemon occupied the IP range the Train Wifi uses. While I was not using docker at all.I guess it clutters even more stuff. Any suggestions what else to look for?
So I am looking for a cleaner container solution. One that feels more like a Linux tool that keeps the system intact and only runs when it runs.
If Kata is such a tool, I would look into it more closely.