Fat OCI images are a cultural problem

PaulDavisThe1st · on Nov 25, 2023

Firefox is (optionally) still distributed as a "distroless" thing called a tarball.

Ardour (my project) is (also) distributed in this way. You get a directory tree that contains the executable, all required dynamically-linked libraries and a bunch of non-code files. The actual program itself is a tiny shell-script that sets LD_LIBRARY_PATH to point into the app directory tree. The result runs on any Linux distro that can provide libstdc++ and Xlib.

Except, ironically, Nixos (because they screw with LD_LIBRARY_PATH use).

Another way of doing this that has been around on Unix for decades is to use the linker's rpath argument, but on many Unix systems for a long time, the runtime linker could not handle relative rpaths, and so this mechanism fell into relative disuse.

And of course, macOS has been doing something very similar with its "app" packaging, which relies on launchd to set DYLIB_LIBRARY_PATH (their equivalent to LD_LIBRARY_PATH) to include the app tree itself. The result is largely independent of the version of macOS the install occurs on, although that is dependent on some additional behind the scenes magic with the macOS run time linker.

comex · on Nov 25, 2023

FYI, that's not how macOS app bundles work. While ELF binaries typically list the shared libraries they depend on with just a filename, Mach-O binaries use a path. This can be an absolute path, which is typically used for OS-provided libraries. Or it can be a path relative to the binary containing the dependency, which is typically used for libraries contained in app bundles. (See the bottom of `man 1 dyld` for more info.) Thus there's no need to set environment variables just to run an app, and launchd doesn't do so.

There is a variable called DYLD_LIBRARY_PATH, but it mostly exists for development/testing purposes. The way it works is that before the dynamic linker resolves a dependency path from a binary, it first searches DYLD_LIBRARY_PATH for a file matching the basename of the path, ignoring the rest of the path. Any such file will take precedence over the original path.

I have seen some apps that rely on a wrapper shell script that sets DYLD_LIBRARY_PATH. I'm not sure if they just have the wrong dependency paths, or if they're dealing with other special cases like calls to dlopen with a bare filename.

PaulDavisThe1st · on Nov 25, 2023

You are absolutely right and I was talking out of my rear end. I know you are right because setting the path in the ELF is a major step in building the Ardour macOS packages.

That said, I still have this vague memory that launchd does in fact reset some linker-related vars before exec-ing the app binary. Could be wrong about that too.

soraminazuki · on Nov 25, 2023

LD_LIBRARY_PATH should work as expected on NixOS. If I may guess, the problem is that you don't have libc / libstdc++ / Xlib included in your LD_LIBRARY_PATH.

PaulDavisThe1st · on Nov 25, 2023

I no longer recall the precise reason. I gave up caring when it became clear that of all the myriad Linux distributions, Nixos is the only one that prevents this model from working in precisely the way it works on every other distro.

tonyarkles · on Nov 25, 2023

I mean, back in the netheryears when we only had chroot on Linux and no “containers”, we would often compile tight statically-linked mini filesystems that only hosted a single application like Apache (potentially with PHP support baked in). I don’t see a technical reason why we can’t still do this.

LegibleCrimson · on Nov 25, 2023

We absolutely can, it's just easier to start from a distro image, install everything you might need, build the software, and call it a job done.

Multi stage images into a scratch final stage fix this, but you have to care about the problem first, and many people don't. Software with many unpredictable dependencies also makes it harder (it's a lot harder to make a really thin image if your program is python, for instance, than if it's compiled).

Edit: actually reading TFA, it's a really good description of the issue. It focuses on nix, but the problem and difficulties are much more general.

soraminazuki · on Nov 25, 2023

The point is that people start from distro images not because it's easy. Rather, it's because there's no other way with Docker. It doesn't have to be this way with better tools. If you use a tool that has clear view of the dependencies of your software to build your image, "distroless" images would be the default. The pain of multi stage images is a Docker-induced problem, not a fundamental issue with packaging in general.

nunez · on Nov 25, 2023

If I'm understanding what you wrote here, that is not true.

You can write a Dockerfile that inherits from "scratch," which projects an almost entirely empty overlay volume into the container's root volume. (I think it writes a .dockerenv file still.) From there, you can add a minimal rootfs that provides exactly just the system libraries your app needs to run.

This is the crux of the problem and a big part of the argument in this article: many devs don't know which dependencies they need from the system, so they create base images from the biggest Ubuntu base images they can find and then apt-get their way into a working system.

This works, but now you have a 3Gi image that has a ton of stuff outside of your apps's core supply chain that introduce risk.

soraminazuki · on Nov 25, 2023

I'm not sure which part you're saying is wrong. Your points seem to reaffirm what I wrote.

dvfjsdhgfv · on Nov 25, 2023

> The point is that people start from distro images not because it's easy. Rather, it's because there's no other way with Docker.

It depends if the process you are going to run is compiled or interpreted. In the latter case you can use multi stage builds, with the former stages building what you need and the final one putting it all together. But it simply makes no sense for cases like Python where it would be terribly inefficient to repeat the same work everybody else is doing hence the FROM python:3.11-slim-bookworm usage.

LegibleCrimson · on Nov 25, 2023

I think you meant "former case" rather than "latter case", but even with compiled programs it can be a little more complex, particularly for a dynamically linked program and especially if your program or libraries use dynamic loading. OpenSSL can do some dynamic loading and depend on environment variables and filesystem artifacts in surprising ways as well. If your program uses any hardcoded paths, it gets more complex as well.

I find it pretty easy to write a program to run in a very minimal docker image, but compiling existing software for the same can be quite hard, and legacy libraries and application code make it even harder. For somebody who is just learning to program, they could easily never really learn the right techniques, because the wrong ones really do just work fine for them. Like electron, these things are very wasteful, but they do work pretty reliably.

LoganDark · on Nov 25, 2023

The technical reason is that while chroot does limit filesystem access, it does not limit or isolate other types of system calls. So for example even though said Apache with PHP would not be able to overwrite your /etc/passwd, it can still interact with the network (incl. localhost and LAN). Of course there are new ways to mitigate this with capabilities and seccomp and BPF and so on, but this didn't exist back in the "netheryears" where chroot was still the state-of-the-art. Adding SELinux was quite a bit better, and Android even runs on SELinux to this day, but it still doesn't reach what Docker can achieve now, with things like cgroups.

To be clear, Docker doesn't exist because it's impossible to properly configure a contained environment for applications already; Docker has to be implemented somehow, after all, and you could just do what it does and have yourself a container. Docker exists because people want a simple way to launch a container that is already fully contained - filesystem, network, CPUs/GPUs, etc - and doing this configuration manually is Work™.

tonyarkles · on Nov 25, 2023

Oh, definitely, but I’m not sure how any of the other cgroups isolation features have anything to do with the massive bloated filesystems that often ends up stuffed into containers. Like… I’m pretty sure I could take the old process of building a tiny self-contained Apache+mod_php filesystem and wrap it up in a Docker container to get the rest of the cgroup magic, but no one does it this way.

LoganDark · on Nov 25, 2023

Not sure, I've always used docker for statically linked executables so I never needed an entire operating system inside. The issue comes when using anyone else's containers.

nunez · on Nov 25, 2023

Totally possible. I think Habitat by Chef tried to do an easier version of this, and many teams on Golang stacks are effectively doing this with Docker today (images with nothing in them except the app).

The issue is that if you're an engineer coming up with the toolchains for your packaging and you say "I opted to write scripts for packaging our stuff as tarballs and running the app in chroots", your lead will ask "why are you wasting time on that and not using Docker?" And it will probably be a difficult conversation.

flurie · on Nov 25, 2023

We honestly even have a better version of this these days with systemd-nspawn, and it's OCI runtime-compatible.

yjftsjthsd-h · on Nov 25, 2023

Having used nix-built OCI images in prod, I feel qualified to comment:

* Yeah, nix defaults to giving you a "distroless" image. This is one of the nice things about it:)

* The space savings turn out to not be that significant compared to just using multi-stage Dockerfile builds - if your image is >100MB of application and associated necessary data files, paying another 3-30MB (alpine is 3, debian slim is 30) for a base image just doesn't matter that much. Compared to Alpine, nix could even give you a bigger image, though I don't recall seeing it happen.

* (Similar to previous point) As the Discourse thread notes, nixpkgs isn't really geared towards producing small outputs; it's more like Debian than Alpine.

* Not having anything but the application sounds good, but it has the very significant disadvantage that the result is (by design) missing coreutils and a shell, which means if your container breaks, you can't `kubectl exec` into it to troubleshoot. (And we never found a way to fix this; Ephemeral Containers looked like they were going to fix it but they don't share mounts with the target container and that really undermined the whole thing for us)

soraminazuki · on Nov 25, 2023

I think the way to look at this is that Nix allows users to decide what kind of image they want.

If I want a smaller package than what the official package offers, I can do just that. Just create a new package on the fly by extending existing package definitions. Disabling features, adding compile flags, dropping dependencies, patching the source of existing packages is very easy to do in Nix. With Debian or Alpine, I have to rebuild software from scratch if some package doesn't fit my needs. With Nix, it's only a few lines of code. For example:

    nginx.override { withPerl = false; }

Nix also lets me choose what I want or don't want to include in my image. A Nix built image is just another Nix "package," so it has enough information to include only the things I need. Docker can't do that because it has no insight whatsoever about files included in an image. So I'm left guessing which files are needed and which aren't and have to tediously copy those in a completely new image.

Finally, Nix lets me choose how I want to run my applications, including if I even want to run them in containers or not. Way too often, Docker-native applications becomes so tied to the image that it can't be run any other way. If I want to change some aspect of the environment that an application runs in, I have to fork the whole thing. Worse, when it comes to building Docker images, there are no reproducibility guarantees. With Dockerfiles essentially being no more than a shell script to make tarballs, they're very much a "works in my environment" way of building software.

yorick · on Nov 25, 2023

Adding `busybox` and `bashInteractive` to the container contents gives you enough of a comfortable environment to work in without losing too much space.

yjftsjthsd-h · on Nov 25, 2023

Is that still distroless? At that point you've effectively recreated Alpine with nix IMO; the only thing missing from the image is a package manager.

kevincox · on Nov 25, 2023

Maybe not but so what? If you want some debugging tools add the debugging tools. Who cares if it is "distroless" anymore. I think the main point of "distroless" images is bringing only what you need, if you need debugging tools then bring them.

willswire · on Nov 25, 2023

This is what the folks at Chainguard are solving with their Wolfi OS: https://github.com/wolfi-dev/os and tools like melange: https://github.com/chainguard-dev/melange

yourapostasy · on Nov 25, 2023

This rant is all well and good, but the folks who really need to hear this are the sales and marketing folks at the third-party software vendors who happily burble that “we’re containerizing/containerized!”. Then you peer into their implementation only to find out they popped a VM into that sucker and called it a day. When you get on a call with them, you find out really containerizing their product (of which “distroless” is only one of many, many re-architecting and re-designing their product should go through) is nowhere on their roadmaps, because too few customers are demanding it.

We might need a “X Factors” list to popularize what checkboxes properly “containerized” products should tick, to help move the needle.

yegle · on Nov 25, 2023

Distroless also has variants that bundle runtimes. E.g. `gcr.io/distroless/python3-debian11` is a variant with Python packages.

This makes it convenient to build&package some Python libraries that require a build step. The process looks like this:

1. Use `debian11` as build env, install `libpython3-dev` 2. Pip install the required packages into a prefix 3. Use `python3-debian11` as the final base image, copy installed file from build env.

Obviously there are other ways to achieve the same goal, but distroless being Debian based makes it easy to rule out any compatibility issues, if you also use Debian as your build env.

For a concrete example here's how I package beancount + fava: https://github.com/yegle/fava-docker

weitzj · on Nov 25, 2023

This post rings a bell for me. I fell in the same trap and saw teams falling in the trap that one might take Docker as a tool to do dependency management since it is very easy to use a FROM statement.

So the trap for me is: mixing up dependency management with Docker, which is just a wrapper for your runtime in 2023™ .

There are already tried and proven tools out there (Deb, rpm) which solve the dependency management problem. Better publish your artifacts as a package backed by a dependency manager and then use the managers tools to make an OCI image as a target.

Using docker images for dependency management does not work. Your FROM will explode with all the combinations you will have to keep in the registry (I.e. „I need nodejs with a C++ image library“)

Docker multistage builds on top seem to confuse people, and they mistake Docker as a CI system to copy artifacts between their multistage builds and maybe start implementing their own caching behavior.

Therefore I suggest to clearly separate the reaonbilities of Docker from a CI runner and from a dependency manager. Otherwise patch management or dependency resolution does not scale for multiple teams.

xenophonf · on Nov 25, 2023

Meh.

Back when they called containers "jails", I did both distroless/single-binary and full-install containers. I like the former a lot. They really make it difficult for attackers to live off the land. On the flip side, they're a huge pain in the ass to maintain, particularly if your programming language of choice doesn't make static linking easy.

Nowadays, where system admins aren't usually programmers, doing a full install of an operating system's userland into a chroot---sorry, I mean "layer"---is pretty damn handy. If you're careful about what packages you install, you can get most of the benefits of a distroless container (e.g., making LoL hard) without having to know too much about how the runtime environment works.

It's all a balance, how much time and money you waste engineering the perfect `FROM scratch` container versus getting actual work done using a "fat" image that's good enough. And I think we all know, deep down, that fat OCI images make the Docker world go 'round.

re-thc · on Nov 25, 2023

> It's all a balance, how much time and money you waste engineering the perfect `FROM scratch` container versus getting actual work done using a "fat" image that's good enough.

Except it never works this way. Most of the time you get told the image won't be used so to do it quickly and forget optimizing. Months later you find other teams "borrow" it and suddenly the whole company is using it.

goalieca · on Nov 25, 2023

> It's all a balance, how much time and money you waste engineering the perfect `FROM scratch` container versus getting actual work done using a "fat" image that's good enough

I’ve toyed with buildpacks and when the builder images don’t suck, and the project isn’t doing weird shit with modules, it seems to work with little effort.

pointlessone · on Nov 25, 2023

The point OP makes that building distroless containers isn't hard is quite true. I put together a script (~350 lines) that uses multi-stage builds on Gentoo. It doesn't take much effort to match small alpine-based "official" containers. On top of that Gentoo provides excellent customisability. Things like nginx or php that have a huge amount of build options allow me building as small or as big ontainers as I need. My script also allows arbitrary combinations of packages so if you want a shell, a debugger, or whatever else you can easily add them in.

The point is it was a weekend experiment. It's by no mean ideal but it's absolutely functional and viable. With a starting investment of a half engineer-weeks it's affordable to pretty much any org.

bogota · on Nov 25, 2023

I agree with everything in that post the problem i have when i even try to get someone to use alpine which is still a full OS just small is that no one cares. Developers don’t care that they have a 3GB image. They don’t want to learn about any of it. Ever time a developer has an issue with docker it comes right to the infra team like we own the image that they completely wrote and now is broken because they pinned it to latest.

We are in the nodejs world now where speed is all that matters. No one wants to understand what is going on under the hood. And no one takes the time to learn something new when your job only cares about the next feature. At the end of the day all that matters is money.

conception · on Nov 25, 2023

This is how everyone works everywhere. The accountant doesn’t want to learn how windows services work, they just want to do their work. The c level obviously doesn’t care how anything works, they just want their data. Expecting people to care about things outside of their domain is a losing battle. Better to focus on taking the part that isnt their domain out of their hands so better choices can be made.

akira2501 · on Nov 25, 2023

> doesn’t want to learn how windows services work, they just want to do their work

Is that actually true or is the company just so focused on short term metrics that it actively discourages this behavior?

wmf · on Nov 25, 2023

Run a security scanner against that 3 GB image and tell the devs they have 1,000 CVEs to fix before they can deploy.

crgwbr · on Nov 25, 2023

That’s a great way to convince the business to just ignore everything the security team says.

tsyklon · on Nov 25, 2023

Or to implement optimized container images built with best security pratices. That’s the point of his argument btw.

crgwbr · on Nov 28, 2023

Yes, I get that. My point is that you won’t accomplish that by holding the business hostage.

re-thc · on Nov 25, 2023

> Developers don’t care that they have a 3GB image. They don’t want to learn about any of it.

In a way it's not their fault. Management never allocates time for such things. It isn't in OKRs / KPIs. No 1 cares because they're not made to care. I've seen developers that tried and get "punished" for it so stop caring.

I've seen people spend their own time to make everyone's life happy e.g. CI/CD build times. End result is management complaining why they don't work on the "priority" i.e. requested features.

> No one wants to understand what is going on under the hood.

That's always been the case because "business" took over. This notion of "tech debt" is 1 of the worst offenders in that to non-technical people debt is something that can be left forever (as long as you pay interest) i.e. they'll never care.

DandyDev · on Nov 25, 2023

But then developers need to learn to play that game. What is the business/financial reason to create small images? Can you come up with a KPI that makes it clear to the whole business that creating smaller images is important? (e.g. infra costs).

If not, than really why should anyone care about image size? If it does not have a meaningful impact on company results, it’s really a form of self-gratification isn’t it?

throw555chip · on Nov 25, 2023

The whole docker/container craze was supposed to fix "works for me".

tsyklon · on Nov 25, 2023

Yes, and no.

Docker images can still have a drift in layer delta if one would not pin system package versioning of the applications being installed, even if the instructions to generate those layers remain the same.

The output being an image does not make all iterations of those instructions the same thing.

kayodelycaon · on Nov 25, 2023

You almost have to get to people before they learn bad habits. When I started creating containers for my company’s projects, I started with alpine and managed to sneak it in as the standard for everything. The developers didn’t care how it worked and it wasn’t hard to get our systems critter on board with the plan to keep images small. Requiring a justification for a non-alpine image helps keep people honest.

ashu1461 · on Nov 25, 2023

Agree, often nobody really notices the underlying issues until something breaks in production. It's a tough balance to strike between moving fast and being thorough

RecycledEle · on Nov 26, 2023

OCI = Open Container Initiative https://opencontainers.org

NIXOS = *NIX (UNIX, Linux) Operating Systems

k__ · on Nov 25, 2023

Why not unikernels?

dang · on Nov 25, 2023

Submitters: please don't editorialize titles - This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html.

If you want to say what you think is important about an article, that's fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else's: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...

(Submitted title was "Fat OCI images are a cultural problem")

mtlynch · on Nov 25, 2023

It seems like submitter might have meant to submit this reply post [0] rather than the root post, and the tl;dr was "Fat OCI images are a cultural problem, not a technical one."

[0] https://discourse.nixos.org/t/oci-images-is-there-something-...

dang · on Nov 25, 2023

Fixed now. Thanks!

soraminazuki · on Nov 25, 2023

Actually, that was the link I submitted. I used the tl;dr because the root post addressed a different topic from the post I was linking to.

dang · on Nov 25, 2023

Ah, I see - what happened is that our software replaced the URL with https://discourse.nixos.org/t/oci-images-is-there-something-... because that is listed as the canonical URL for the page you submitted.

I've changed the URL and title now. Sorry for the misunderstanding!