Atop 2.11 heap problems

ianbutler · 2025-03-29T23:03:41 1743289421

Hey guys we commented on another thread from a few days ago about our tool Bismuth finding the bug (along with a sha of our reproducer script for proof) https://news.ycombinator.com/item?id=43489944

After disclosing and having correspondence with Gerlof and from his above post it looks like we did in fact nail it and I've just shared our write up on how we got it.

HN post detailing how we got it: https://news.ycombinator.com/item?id=43519522

Edit: Here's our reproducer and we've added it to the post too: https://gist.github.com/kallsyms/3acdf857ccc5c9fbaae7ed823be...

hannob · 2025-03-30T07:41:25 1743320485

> HN post detailing how we got it: https://news.ycombinator.com/item?id=43519522

I don't see any details there. Is there some link missing here, or is it the wrong link?

I'd be interested to read how your tool found it.

stavros · 2025-03-30T10:29:18 1743330558

It's just "we asked our LLM and it found the bug", as I understand it.

saagarjha · 2025-03-29T23:08:43 1743289723

What is that a hash of?

ianbutler · 2025-03-29T23:08:54 1743289734

As noted, our reproducer script

saagarjha · 2025-03-29T23:10:39 1743289839

Right, but where’s the script?

ianbutler · 2025-03-29T23:17:49 1743290269

https://gist.github.com/kallsyms/3acdf857ccc5c9fbaae7ed823be...

From my co-founders account

saagarjha · 2025-03-29T23:39:56 1743291596

Cool, thanks for adding it. It would also be nice if you posted how you generated the hash :) I’m not trying to be annoying but this is a critical part of how these hashes work; you post the hash early to indicate you have some information early and then later you demonstrate that by actually presenting the artifact with that hash. If you don’t publish the artifact so people can check that it is actually what you claim it is then your hash is worthless (as nobody can prove it’s not, like, the hash of a cat photo). And you’d generally want to demonstrate how you generated the hash just so people don’t have to figure out whether to md5 or sha1sum it.

kallsyms · 2025-03-29T23:53:24 1743292404

Hey yeah got caught up in the excitement of finding it :)

It's a SHA256 - `shasum -a 256 server.py`

geerlingguy · 2025-03-29T23:29:40 1743290980

This doesn't seem nearly as nefarious as the post from earlier this week indicated... I had expected a full supply chain compromise or something that bad based on the earlier post.

barotalomey · 2025-03-30T03:04:29 1743303869

Yea, my first thought was this is a unrelated find because eyeballs since the recent focus.

f33d5173 · 2025-03-30T01:23:03 1743297783

Yeah being taciturn was really the worst thing you could do

echoangle · 2025-03-29T21:39:51 1743284391

"Problems with the heap" - https://news.ycombinator.com/item?id=43485980

dang · 2025-03-30T00:35:44 1743294944

Thanks! Macroexpanded:

Problems with the heap - https://news.ycombinator.com/item?id=43485980 - March 2025 (93 comments)

You might want to stop running atop - https://news.ycombinator.com/item?id=43477057 - March 2025 (139 comments)

cullenking · 2025-03-30T03:58:21 1743307101

I was bit by atop a few years back and swore it off. I would get perfectly periodic 10m hangs on MySQL. Apparently they changed the default runtime options such that it used an expensive metric gathering technique with a 10m cron job that would hang any large memory process on the system. It was one of those “no freaking way” revelations after 3 days troubleshooting everything.

Interesting reading through the related submission comments and seeing other hard to troubleshoot bugs. I don’t think atop devs are to blame, my guess is that what you have to do to make a tool like atop work means you are hooking into lots of places that have potential to have unintended consequences.

unsnap_biceps · 2025-03-29T21:53:45 1743285225

It's unfortunate that Unix sockets isn't being used for local connections like this.

charcircuit · 2025-03-29T23:18:08 1743290288

It's more unfortunate a proper RPC library is not being used. People rolling their own buggy parsers in C is an endless source of bugs.

ahoka · 2025-03-30T10:45:02 1743331502

The whole code is horrible: https://github.com/Atoptool/atop/commit/542b7f7ac52926ca2721...

Inconsistent usage of braces, no clear memory ownership or life-cycles, zero tests.

the-lazy-guy · 2025-03-30T15:45:23 1743349523

Can you please provide an example of good C code?

I agree that absence of tests isn't great, and is very common with many C-based projects. But the rest of your comments reads like "ooh, it's C, disgusting!". I hope, I'm wrong.

woodruffw · 2025-03-30T18:10:55 1743358255

sqlite3 is the canonical example of a mature, well-structured, excellently tested C codebase. I would also submit cURL/libcURL as a strong example.

the-lazy-guy · 2025-03-30T22:13:06 1743372786

Thank you. These 2 are well-known, as well as plenty others. But I wanted to see answer from the author of the comment to which I replied. Apart from tests (of which both sqlite and curl have plenty, and that is obviously good), I don't see any reasonable difference in sqlite or curl code in aspects which were mentioned in their comment (namely, style and ownership). I'd like to see what they think is reasonable C code.

timcobb · 2025-03-30T00:45:42 1743295542

> People rolling their own buggy parsers in C

I'd like to believe this isn't common anymore for new projects?

worthless-trash · 2025-03-30T03:19:13 1743304753

I dont want to ruin your weekend.

ajross · 2025-03-29T22:26:53 1743287213

Meh. This isn't a technology choice problem. Routine unix sockets are just some file in /tmp which an attacker could likewise open by racing against the daemon in the same way.

It's true you could use a privileged spot in the filesystem and set things up to use that by writing some simple extra software, but it's equally true that you could lock down a TCP socket to a specific process with about the same amount of work.

Bottom line is that you need to validate your input from outside the process if you're running in a privileged context[1], and atop didn't.

[1] It's not mentioned in the linked email, but I assume the core problem here (and the reason it got a CVE number) is that the atop binary is setuid?

adrianmonk · 2025-03-29T23:15:59 1743290159

> Routine unix sockets are just some file in /tmp which an attacker could likewise open by racing against the daemon in the same way.

So put the socket in /run instead of /tmp?

I'm no expert, but this appears to be where they belong, and it appears to solve the problem. From https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s15.htm... : "System programs that maintain transient UNIX-domain sockets must place them in this directory or an appropriate subdirectory as outlined above." ... "/run should not be writable for unprivileged users; it is a major security problem if any user can write in this directory."

ajross · 2025-03-30T00:49:06 1743295746

Putting them in /run if you're not already root requires a little extra software be written though. Locking down a TCP socket isn't much harder. I'm not saying "don't use Unix domain sockets", I'm saying that treating this bug as the result of technology choice is bad security analysis.

Zardoz84 · 2025-03-30T09:38:57 1743327537

The real problem is the buggy parser, and that is enabled by default, even if you aren't showing anything related to the GPU or launched the daemon.

adrianmonk · 2025-03-30T15:54:08 1743350048

> if you're not already root

Hmm, good point. I think we made opposite assumptions about that.

If the daemon does run as a root, then no extra software is required. For Unix domain sockets, you can trivially create your socket in /run, and for TCP, you can trivially use a port below 1024.

If it doesn't, then some extra software or configuration is required in either case.

I tried looking it up, and I think it does run as root[1]. But I also found that the daemon uses a Python library to get GPU stats, and root might or might not be required depending on how the GPU software is configured[2]. So it could have gone either way.

---

[1] That's how I read this: https://github.com/Atoptool/atop/blob/master/atopgpu.service

[2] See https://github.com/gpuopenanalytics/pynvml/issues/19

fpoling · 2025-03-29T23:56:55 1743292615

These days Unix sockets for system daemons should be placed under /run with permissions that only a particular daemon can access for binding. With systemd service and socket units it is trivial to do.

3np · 2025-03-30T02:53:50 1743303230

> but it's equally true that you could lock down a TCP socket to a specific process with about the same amount of work.

How, actually? With UNIX sockets it can be a matter of setting file ownership and mode (at worst, a chmod and a chown).

What's the equally simple way to restrict access to a locally listening tcp socket?

johnmaguire · 2025-03-29T22:47:54 1743288474

> but it's equally true that you could lock down a TCP socket to a specific process with about the same amount of work.

Can you educate me? I'm familiar with SO_PEERCRED that returns the user/group/pid on the other end. Would you then checksum the exe of the pid from /proc?

theamk · 2025-03-29T23:13:20 1743290000

SO_PEERCRED is only for Unix domains though, it's not going to work for TCP.

For TCP, your only easy option is to have port <1024 - but that requires root. If you want a dedicated user, then TCP requires hacks - like creating a cookie file in some protected location, like XAuthority does.

But if you have a protected location, why even bother with all this? Just create a UNIX socket there directly, after all the difference is only in connect call, read/write loop is the same. And as an extra bonus there is much better visibility, and zero chance of someone accidentally grabbing your magic number.

Unix sockets are really underappreciated.

johnmaguire · 2025-03-31T15:48:38 1743436118

Sorry to be pedantic, but this doesn't really allow you to lock down the socket to "a specific process" does it? You're talking about restricting it to root, or another particular user/group.

I'm interested in this as I've been working on a problem myself where I'm trying to restrict access to a specific process (or a specific application), without much care for which user is running that process. On mobile, there are lots of solutions for protected locations (as you suggest) that allow sharing files across applications within a publisher, for example.

theamk · 2025-03-31T20:24:21 1743452661

Correct, this is for specific user/group.

Restricting use to "specific application for any user" sounds pretty dodgy, security-wise. Linux makes no guarantees that processes are protected from executing user, so it is entirely possible your process has the right name, but runs different code. LD_PRELOAD and ptrace immediately come to mind, but I am sure there are other methods too.

That's why Android makes a unique UID per app - this turns insecure "restrict by process name" problem into well-supported "restrict by UID/GID".

(And if there no need for security boundary, and you only want convenience check to avoid non-malicious mistakes? Then just hardcode magic string in your app and check it as a part of protocol.)

ajross · 2025-03-29T22:52:22 1743288742

You can check socket credentials, indeed. You can set up filtering rules to match on UID using nftables. You can do things like put a cookie somewhere else to exchange and authenticate the connection a-la xauth. You could use TLS and check the host key vs. a public key stored at install time. There are many ways to do this, none of which require more than a few dozen lines of code/config.

But really the simplest thing would just be to use a port <1024 so that only root can open it. That's literally what the feature was for. You can still be "attacked", but only by someone who already has local root.

3np · 2025-03-30T04:22:49 1743308569

None of that (save for running as root, which is very crude, much less granular, and requires promoting privileges of the process in question to root) is "about the same amount of work" as using a unix socket directly.

ajross · 2025-03-30T16:41:42 1743352902

If the daemon isn't running as root it can't put the socket in a secure location, requiring more code. That code isn't complicated, but neither are any of the suggestions above.

Once more: people wanting to make this security bug about the specific socket family in use are doing bad security analysis. There's nothing wrong with TCP, the app just did it wrong and failed to recognize the security boundary being crossed.

johnmaguire · 2025-03-31T15:50:42 1743436242

This is all well and good if you want to restrict access to root users, but I thought we were trying to restrict access "to a specific process" (i.e. a specific client application.)

ajross · 2025-03-31T17:02:19 1743440539

Open the socket and drop privilege before launching the daemon. I mean, come on: inetd could do this back in 4.3BSD on a VAX.

I remain absolutely dumbfounded how people in this subthread are going to the matresses trying to explain why Unix sockets are great and TCP isn't, when they both suck in exactly the same way and the correct answer is "validate your input" and not "use a different API".

johnmaguire · 2025-03-31T19:36:13 1743449773

I'm not trying to explain why Unix sockets are great and TCP isn't... I'm trying to solve a real-world problem along a similar vein myself. FWIW, I agree that you should use Unix sockets for local-machine access - you can't accidentally expose them off the box like you can a TCP socket. But that's neither here nor there.

You seem to be misunderstanding the scenario I'm describing: I have a daemon that runs in a privileged context (as root.) I have a client that connects to the daemon, as any user on the box. The client cannot be run as root because the user does not have permission to do so.

I want to ensure that only my client can connect to the daemon. I can't use user/group permissions, because I don't care what user/group has access. I want to make sure a specific process (or a specific binary/executable) has access. To quote the comment I initially responded to:

> it's equally true that you could lock down a TCP socket to a specific process with about the same amount of work.

On a Unix machine, this is often done by creating a group to use for access (e.g. a docker group.) This works to lock down a TCP socket to a specific group but not to a specific process. Using shared secrets stored elsewhere on the box also doesn't help here, since any other process could access those secrets.

The best I know of is using something like XPC on macOS, using SO_GETPEERCRED and checksum'ing the pid out of /proc/<pid>/exe, or perhaps using some other platform-specific code signing API.

I was excited to hear that it was easy. I'm disappointed now.

lelanthran · 2025-03-30T12:25:26 1743337526

> Bottom line is that you need to validate your input from outside the process if you're running in a privileged context[1]

What this "if" qualifier? You need to validate all input from outside the process. Whether the process is privileged or not is, frankly, not really relevant.

(I submitted a blog post a few days ago explaining "Parse, Don't Validate" in plain C, but it didn't get any traction).

ajross · 2025-03-31T14:23:41 1743431021

> What this "if" qualifier? You need to validate all input from outside the process.

Not all tools are designed to accept input from outside a security boundary. Obviously atop isn't one, but the world is filled with software that misbehaves on bad input. Ever DDoS your build system by misconfiguring something? Crash a running program by removing a cache directory (or unpacking a tarball on top of it)?

It's very rarely a bad idea to fail to validate input. But it's for sure not always a requirement either.

And to be blunt, it's not really possible either. You write "insecure" parsers/interpreters/whatever probably every day, we all do. And you "know" when it's safe and when it's not, I'm sure. But my point is that if that knowledge isn't based on at least a little bit of rigor ("crossing a privilege boundary" in this case), you're probably going to do it wrong.

eptcyka · 2025-03-30T06:59:32 1743317972

It is. But even with unix sockets, the client should never blindly trust the bytes received and parse them defensively.

yjftsjthsd-h · 2025-03-29T21:39:07 1743284347

Ah, there's the other shoe:)

> optional sources, that have to be activated explicitly.

So only locally exploitable, and you have to enable an optional feature? That's ... honestly better than I was worried that it might be

dgacmu · 2025-03-29T21:55:09 1743285309

No. Local but it always tries to connect and the deamon to which it tries to connect is optional, which means that the default is attackable. An attacker can run their own program on the port and send bad strings that will cause an overflow.

yjftsjthsd-h · 2025-03-30T00:36:53 1743295013

Oh, I see, thanks.

> Therefore, the default behavior of atop is now not to connect to the TCP port at all.

I missed that now it defaults to not connecting.

MattPalmer1086 · 2025-03-29T21:50:50 1743285050

The fix is to make it optional.

But yeah, I was anticipating something quite a bit worse.

immibis · 2025-03-29T21:43:39 1743284619

> always tries to connect

xyst · 2025-03-29T23:15:11 1743290111

Right, the post on “rachelbythebay” was hinting at something much worse.

brazzy · 2025-03-30T08:02:39 1743321759

How so? It was pretty clear from her second post that it's a local privilege escalation. And that is is, and otherwise fairly easily exploitable.

natebc · 2025-03-30T16:21:18 1743351678

well, the first post opened with "You might want to stop running atop" and followed with "Right now, I think it's probably best if you uninstall atop. I don't mean just stopping it, but actually keep it from being executed."

Which does indeed hint at something much worse IMO.

To be clear: I value rachaels opinion and contributions greatly. Maybe just these days I'm a little grounchy about panicky security people making us spend hours during the middle of the week uninstalling atop from hundreds of systems that wouldn't have been at risk from something like this.

mvdtnz · 2025-03-29T21:57:20 1743285440

Did you stop reading at that sentence?

yjftsjthsd-h · 2025-03-30T00:41:12 1743295272

Unlikely, since the use of a local TCP part was later than the quoted sentence. Granted, I did skim, but after having it clarified and rereading, I think that introduction is misleadingly phrased and would benefit from clearer delineation of the previous vulnerable behavior and the fixed behavior.

Galanwe · 2025-03-30T10:43:38 1743331418

> The vulnerability is caused by the fact that atop always tries to connect to the TCP port of 'atopgpud' during initialization. When another local program has been started (instead of 'atopgpud') that listens to this TCP port, atop connects to that program. Such program is able then to send unexpected strings that may lead to parsing failures in atop. These failures result in heap problems and segmentation faults.

Okay, so, if I have a shell and the rights to listen on a host, I can crash the "atop" of other users? That's it ? I could also create a fork bomb, fill up the disk, use all CPU and memory, etc...

TonyTrapp · 2025-03-30T15:43:00 1743349380

Not the same thing at all if atop runs as root and you are a user on that system that has no root access. With a well-prepared exploit you could achieve code execution as root. That's a bit more than a simple Denial of Service by filling up the disk.

bitbasher · 2025-03-30T15:26:41 1743348401

I think the concern is for privilege escalation.

mvdtnz · 2025-03-30T05:55:16 1743314116

So what was the point of Rachel's vagueposting? Was there any kind of NDA or a good reason to be so vague?

brazzy · 2025-03-30T08:03:09 1743321789

Responsible disclosure?

stiild · 2025-03-29T22:39:59 1743287999

I have a semi-related question.For someone whose main job is not maintaining or running full linux servers but would like information about processes and their RAM/CPU..etc. What would be a good tool that is easy to parse with good defaults?

edoceo · 2025-03-29T23:44:00 1743291840

The tool btop was suggested in the other thread to replace atop and htop.

0manrho · 2025-03-30T04:57:58 1743310678

Seconding btop++, been running it as my main top for a few years now, and switched from htop. I didn't have a single complaint about htop, did what it said on the tin and did it well in my experience, but personally I prefer btop's ux/ui.

worthless-trash · 2025-03-30T03:21:56 1743304916

If you are writing software to parse it, dont use third party tooling. Read the kernel outputs directly (/proc/ /sys etc).

While they do have no guarantee not to change, if they do change any tool you are parsing will also be broken.

ezekiel68 · 2025-03-30T16:31:31 1743352291

I recommend.. atop, now that it has been updated to address this issue.

candiddevmike · 2025-03-29T23:15:43 1743290143

Node exporter is a good start, or you could look at Netdata

calvinmorrison · 2025-03-30T00:51:32 1743295892

htop is a decent curses processes manager that's a few miles better than top

Zardoz84 · 2025-03-30T09:41:17 1743327677

I recommend nmon

zitterbewegung · 2025-03-29T22:09:28 1743286168

Is it just me or does this seem like a bad design where a TCP port is exposed to share information?

kevincox · 2025-03-29T22:15:18 1743286518

Yes. Any local process can connect to a TCP port (unless special care is taken) so it should be a last-resort option. Additionally the sever either needs to be run as root to bind a privileged port or any application can race over binding that port. UNIX sockets are a much better option as they can be protected by filesystem permissions including who can bind the socket and who can connect to it.

This can be mitigated by having authentication inside the socket, but now your authentication code is an attack surface and how are you going to share the secrets? On the filesystem? You are basically back to a UNIX socket with extra steps.

marginalia_nu · 2025-03-29T22:11:52 1743286312

As long as you bind to localhost it's fine in theory. Though any network code still needs to be rigorously hardened.

echoangle · 2025-03-29T22:30:34 1743287434

> As long as you bind to localhost it's fine in theory

But only if you assume that the data being transferred is public, right?

With the described method, any non-privilieged user could access the data from the TCP socket, right?

marginalia_nu · 2025-03-29T23:30:35 1743291035

Information in top isn't much of a secret though.

Havoc · 2025-03-30T11:15:45 1743333345

That sounds less bad than expected

amiga386 · 2025-03-29T23:43:43 1743291823

So, as https://www.cve.org/CVERecord?id=CVE-2025-31160 says:

* CWE-617 Reachable Assertion

* affected from 0 through 2.11.0

... can we assume these will be updated to the actual vulnerability (CWE-940, CWE-120?), and vulnerable versions (2.4.0 through 2.11.0)? Or was the vaguepost about an entirely different vulnerability? Does anyone yet know what specific issue the vaguepost was alluding to?

Zardoz84 · 2025-03-30T09:36:04 1743327364

omg .. Why a TCP port instead of using a UNIX socket ?

taspeotis · 2025-03-29T23:27:50 1743290870

> the parsing of the strings is improved to avoid that heap problems can occur.

Tell me what language you’re using without telling me what language you’re using…

nubinetwork · 2025-03-29T22:25:37 1743287137

atop freaks out if it isn't talking to the thing it thinks it's talking to... who would have thunked it... I feel like a lot of programs have that issue.

kccqzy · 2025-03-29T23:02:17 1743289337

It's acceptable to freak out by crashing. It's even acceptable to crash via explicit assertion failure if the developers don't want to write proper error handling. It's not acceptable to crash via segmentation fault.

dzaima · 2025-03-30T03:02:36 1743303756

It's to an extent even acceptable to crash via segmentation fault (more specifically, doing whatever unsafe exploitable things may come of the source of the issue) if it takes the same amount of privileges to cause the crash as the thing crashing has.

And that's the important thing violated here, atop being rather reasonably ran by root to examine root processes, whereas the exploiter just needs the ability to host a thing on a specific port.

uecker · 2025-03-30T11:37:48 1743334668

A segmentation fault is perfectly fine as long as an attacker can not cause any other action before it (but I guess this is the case here).

Polizeiposaune · 2025-03-30T14:09:12 1743343752

Ah, but will it always segementation fault?

It can be difficult to prove that an out-of-bounds memory reference triggered by malformed input will always result in a segmentation fault instead of a read or write of an "interesting" memory location.

uecker · 2025-03-31T12:24:50 1743423890

This depends. In this, I guess the issue is that there is some oob memory reference. But for example a null pointer deference resulting in a segmentation fault is not (necessarily) a security problem.