Hey guys we commented on another thread from a few days ago about our tool Bismuth finding the bug (along with a sha of our reproducer script for proof)
https://news.ycombinator.com/item?id=43489944
After disclosing and having correspondence with Gerlof and from his above post it looks like we did in fact nail it and I've just shared our write up on how we got it.
Cool, thanks for adding it. It would also be nice if you posted how you generated the hash :) I’m not trying to be annoying but this is a critical part of how these hashes work; you post the hash early to indicate you have some information early and then later you demonstrate that by actually presenting the artifact with that hash. If you don’t publish the artifact so people can check that it is actually what you claim it is then your hash is worthless (as nobody can prove it’s not, like, the hash of a cat photo). And you’d generally want to demonstrate how you generated the hash just so people don’t have to figure out whether to md5 or sha1sum it.
This doesn't seem nearly as nefarious as the post from earlier this week indicated... I had expected a full supply chain compromise or something that bad based on the earlier post.
I was bit by atop a few years back and swore it off. I would get perfectly periodic 10m hangs on MySQL. Apparently they changed the default runtime options such that it used an expensive metric gathering technique with a 10m cron job that would hang any large memory process on the system. It was one of those “no freaking way” revelations after 3 days troubleshooting everything.
Interesting reading through the related submission comments and seeing other hard to troubleshoot bugs. I don’t think atop devs are to blame, my guess is that what you have to do to make a tool like atop work means you are hooking into lots of places that have potential to have unintended consequences.
I agree that absence of tests isn't great, and is very common with many C-based projects. But the rest of your comments reads like "ooh, it's C, disgusting!". I hope, I'm wrong.
Thank you. These 2 are well-known, as well as plenty others. But I wanted to see answer from the author of the comment to which I replied. Apart from tests (of which both sqlite and curl have plenty, and that is obviously good), I don't see any reasonable difference in sqlite or curl code in aspects which were mentioned in their comment (namely, style and ownership). I'd like to see what they think is reasonable C code.
Meh. This isn't a technology choice problem. Routine unix sockets are just some file in /tmp which an attacker could likewise open by racing against the daemon in the same way.
It's true you could use a privileged spot in the filesystem and set things up to use that by writing some simple extra software, but it's equally true that you could lock down a TCP socket to a specific process with about the same amount of work.
Bottom line is that you need to validate your input from outside the process if you're running in a privileged context[1], and atop didn't.
[1] It's not mentioned in the linked email, but I assume the core problem here (and the reason it got a CVE number) is that the atop binary is setuid?
> Routine unix sockets are just some file in /tmp which an attacker could likewise open by racing against the daemon in the same way.
So put the socket in /run instead of /tmp?
I'm no expert, but this appears to be where they belong, and it appears to solve the problem. From https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s15.htm... : "System programs that maintain transient UNIX-domain sockets must place them in this directory or an appropriate subdirectory as outlined above." ... "/run should not be writable for unprivileged users; it is a major security problem if any user can write in this directory."
Putting them in /run if you're not already root requires a little extra software be written though. Locking down a TCP socket isn't much harder. I'm not saying "don't use Unix domain sockets", I'm saying that treating this bug as the result of technology choice is bad security analysis.
Hmm, good point. I think we made opposite assumptions about that.
If the daemon does run as a root, then no extra software is required. For Unix domain sockets, you can trivially create your socket in /run, and for TCP, you can trivially use a port below 1024.
If it doesn't, then some extra software or configuration is required in either case.
I tried looking it up, and I think it does run as root[1]. But I also found that the daemon uses a Python library to get GPU stats, and root might or might not be required depending on how the GPU software is configured[2]. So it could have gone either way.
These days Unix sockets for system daemons should be placed under /run with permissions that only a particular daemon can access for binding. With systemd service and socket units it is trivial to do.
> but it's equally true that you could lock down a TCP socket to a specific process with about the same amount of work.
Can you educate me? I'm familiar with SO_PEERCRED that returns the user/group/pid on the other end. Would you then checksum the exe of the pid from /proc?
SO_PEERCRED is only for Unix domains though, it's not going to work for TCP.
For TCP, your only easy option is to have port <1024 - but that requires root. If you want a dedicated user, then TCP requires hacks - like creating a cookie file in some protected location, like XAuthority does.
But if you have a protected location, why even bother with all this? Just create a UNIX socket there directly, after all the difference is only in connect call, read/write loop is the same. And as an extra bonus there is much better visibility, and zero chance of someone accidentally grabbing your magic number.
Sorry to be pedantic, but this doesn't really allow you to lock down the socket to "a specific process" does it? You're talking about restricting it to root, or another particular user/group.
I'm interested in this as I've been working on a problem myself where I'm trying to restrict access to a specific process (or a specific application), without much care for which user is running that process. On mobile, there are lots of solutions for protected locations (as you suggest) that allow sharing files across applications within a publisher, for example.
Restricting use to "specific application for any user" sounds pretty dodgy, security-wise. Linux makes no guarantees that processes are protected from executing user, so it is entirely possible your process has the right name, but runs different code. LD_PRELOAD and ptrace immediately come to mind, but I am sure there are other methods too.
That's why Android makes a unique UID per app - this turns insecure "restrict by process name" problem into well-supported "restrict by UID/GID".
(And if there no need for security boundary, and you only want convenience check to avoid non-malicious mistakes? Then just hardcode magic string in your app and check it as a part of protocol.)
You can check socket credentials, indeed. You can set up filtering rules to match on UID using nftables. You can do things like put a cookie somewhere else to exchange and authenticate the connection a-la xauth. You could use TLS and check the host key vs. a public key stored at install time. There are many ways to do this, none of which require more than a few dozen lines of code/config.
But really the simplest thing would just be to use a port <1024 so that only root can open it. That's literally what the feature was for. You can still be "attacked", but only by someone who already has local root.
None of that (save for running as root, which is very crude, much less granular, and requires promoting privileges of the process in question to root) is "about the same amount of work" as using a unix socket directly.
If the daemon isn't running as root it can't put the socket in a secure location, requiring more code. That code isn't complicated, but neither are any of the suggestions above.
Once more: people wanting to make this security bug about the specific socket family in use are doing bad security analysis. There's nothing wrong with TCP, the app just did it wrong and failed to recognize the security boundary being crossed.
This is all well and good if you want to restrict access to root users, but I thought we were trying to restrict access "to a specific process" (i.e. a specific client application.)
Open the socket and drop privilege before launching the daemon. I mean, come on: inetd could do this back in 4.3BSD on a VAX.
I remain absolutely dumbfounded how people in this subthread are going to the matresses trying to explain why Unix sockets are great and TCP isn't, when they both suck in exactly the same way and the correct answer is "validate your input" and not "use a different API".
I'm not trying to explain why Unix sockets are great and TCP isn't... I'm trying to solve a real-world problem along a similar vein myself. FWIW, I agree that you should use Unix sockets for local-machine access - you can't accidentally expose them off the box like you can a TCP socket. But that's neither here nor there.
You seem to be misunderstanding the scenario I'm describing: I have a daemon that runs in a privileged context (as root.) I have a client that connects to the daemon, as any user on the box. The client cannot be run as root because the user does not have permission to do so.
I want to ensure that only my client can connect to the daemon. I can't use user/group permissions, because I don't care what user/group has access. I want to make sure a specific process (or a specific binary/executable) has access. To quote the comment I initially responded to:
> it's equally true that you could lock down a TCP socket to a specific process with about the same amount of work.
On a Unix machine, this is often done by creating a group to use for access (e.g. a docker group.) This works to lock down a TCP socket to a specific group but not to a specific process. Using shared secrets stored elsewhere on the box also doesn't help here, since any other process could access those secrets.
The best I know of is using something like XPC on macOS, using SO_GETPEERCRED and checksum'ing the pid out of /proc/<pid>/exe, or perhaps using some other platform-specific code signing API.
I was excited to hear that it was easy. I'm disappointed now.
> Bottom line is that you need to validate your input from outside the process if you're running in a privileged context[1]
What this "if" qualifier? You need to validate all input from outside the process. Whether the process is privileged or not is, frankly, not really relevant.
(I submitted a blog post a few days ago explaining "Parse, Don't Validate" in plain C, but it didn't get any traction).
> What this "if" qualifier? You need to validate all input from outside the process.
Not all tools are designed to accept input from outside a security boundary. Obviously atop isn't one, but the world is filled with software that misbehaves on bad input. Ever DDoS your build system by misconfiguring something? Crash a running program by removing a cache directory (or unpacking a tarball on top of it)?
It's very rarely a bad idea to fail to validate input. But it's for sure not always a requirement either.
And to be blunt, it's not really possible either. You write "insecure" parsers/interpreters/whatever probably every day, we all do. And you "know" when it's safe and when it's not, I'm sure. But my point is that if that knowledge isn't based on at least a little bit of rigor ("crossing a privilege boundary" in this case), you're probably going to do it wrong.
No. Local but it always tries to connect and the deamon to which it tries to connect is optional, which means that the default is attackable. An attacker can run their own program on the port and send bad strings that will cause an overflow.
well, the first post opened with "You might want to stop running atop" and followed with "Right now, I think it's probably best if you uninstall atop. I don't mean just stopping it, but actually keep it from being executed."
Which does indeed hint at something much worse IMO.
To be clear: I value rachaels opinion and contributions greatly. Maybe just these days I'm a little grounchy about panicky security people making us spend hours during the middle of the week uninstalling atop from hundreds of systems that wouldn't have been at risk from something like this.
Unlikely, since the use of a local TCP part was later than the quoted sentence. Granted, I did skim, but after having it clarified and rereading, I think that introduction is misleadingly phrased and would benefit from clearer delineation of the previous vulnerable behavior and the fixed behavior.
> The vulnerability is caused by the fact that atop always tries to connect
to the TCP port of 'atopgpud' during initialization. When another local
program has been started (instead of 'atopgpud') that listens to this TCP
port, atop connects to that program. Such program is able then to send
unexpected strings that may lead to parsing failures in atop. These failures
result in heap problems and segmentation faults.
Okay, so, if I have a shell and the rights to listen on a host, I can crash the "atop" of other users? That's it ? I could also create a fork bomb, fill up the disk, use all CPU and memory, etc...
Not the same thing at all if atop runs as root and you are a user on that system that has no root access. With a well-prepared exploit you could achieve code execution as root. That's a bit more than a simple Denial of Service by filling up the disk.
I have a semi-related question.For someone whose main job is not maintaining or running full linux servers but would like information about processes and their RAM/CPU..etc. What would be a good tool that is easy to parse with good defaults?
Seconding btop++, been running it as my main top for a few years now, and switched from htop. I didn't have a single complaint about htop, did what it said on the tin and did it well in my experience, but personally I prefer btop's ux/ui.
Yes. Any local process can connect to a TCP port (unless special care is taken) so it should be a last-resort option. Additionally the sever either needs to be run as root to bind a privileged port or any application can race over binding that port. UNIX sockets are a much better option as they can be protected by filesystem permissions including who can bind the socket and who can connect to it.
This can be mitigated by having authentication inside the socket, but now your authentication code is an attack surface and how are you going to share the secrets? On the filesystem? You are basically back to a UNIX socket with extra steps.
... can we assume these will be updated to the actual vulnerability (CWE-940, CWE-120?), and vulnerable versions (2.4.0 through 2.11.0)? Or was the vaguepost about an entirely different vulnerability? Does anyone yet know what specific issue the vaguepost was alluding to?
atop freaks out if it isn't talking to the thing it thinks it's talking to... who would have thunked it... I feel like a lot of programs have that issue.
It's acceptable to freak out by crashing. It's even acceptable to crash via explicit assertion failure if the developers don't want to write proper error handling. It's not acceptable to crash via segmentation fault.
It's to an extent even acceptable to crash via segmentation fault (more specifically, doing whatever unsafe exploitable things may come of the source of the issue) if it takes the same amount of privileges to cause the crash as the thing crashing has.
And that's the important thing violated here, atop being rather reasonably ran by root to examine root processes, whereas the exploiter just needs the ability to host a thing on a specific port.
It can be difficult to prove that an out-of-bounds memory reference triggered by malformed input will always result in a segmentation fault instead of a read or write of an "interesting" memory location.
This depends. In this, I guess the issue is that there is some oob memory reference. But for example a null pointer deference resulting in a segmentation fault is not (necessarily) a security problem.
After disclosing and having correspondence with Gerlof and from his above post it looks like we did in fact nail it and I've just shared our write up on how we got it.
HN post detailing how we got it: https://news.ycombinator.com/item?id=43519522
Edit: Here's our reproducer and we've added it to the post too: https://gist.github.com/kallsyms/3acdf857ccc5c9fbaae7ed823be...