Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A big part of the problem is all the tooling around git (like the default github UI) which hides diffs for binary files like these pseudo-"test" files. Makes them an ideal place to hide exploit data since comparatively few people would bother opening a hex editor manually.


How many people read autoconf scripts, though? I think those filters are symptom of the larger problem that many popular C/C++ codebases have these gigantic build files which even experts try to avoid dealing with. I know why we have them but it does seem like something which might be worth reconsidering now that the tool chain is considerably more stable than it was in the 80s and 90s.


How many people read build.rs files of all the transitive dependencies of a moderately large Rust project?

Autoconf is bad in this respect but it's not like the alternatives are better (maybe Bazel).


The alternatives are _better_ but still not great. build.rs is much easier to read and audit, for example, but it’s definitely still the case that people probably skim past it. I know that the Rust community has been working on things like build sandboxing and I’d expect efforts to be a lot easier there than in a mess of m4/sh where everyone is afraid to break 4 decades of prior usage.


build.rs is easier to read, but it's the tip of the iceberg when it comes to auditing.

If I were to sneak in some underhanded code, I'd do it through either a dependency that is used by build.rs (not unlike what was done for xz) or a crate purporting to implement a very useful procedural macro...


Bazel has its problems but the readability is definitely better. And bazel BUILD files are quite constrained in what it can do.


I mean, autoconf is basically a set of template programs for snffing out whether a system has X symbol available to the linker. Any replacement for it would end up morphing into it over time.

Some things are just that complex.


We have much better tools now and much simpler support matrices, though. When this stuff was created, you had more processor architectures, compilers, operating systems, etc. and they were all much worse in terms of features and compatibility. Any C codebase in the 90s was half #ifdef blocks with comments like “DGUX lies about supporting X” or “SCO implemented Y but without option Z so we use Q instead”.


That's the essence of programming in C though...

You figure out what the hardware designers actually did, and get the program written to accommodate it.


I don't see how showing the binary diffs would help. 99.99999% of people would just scroll right past them anyways.


Even in binary you can see patterns. Not saying it's perfect to show binary diffs (but it is better than showing nothing) but I know even my slow mammalian brain can spot obvious human readable characters in various binary encoding formats. If I see a few in a row which doesn't make sense, why wouldn't I poke it?


This particular file was described as an archive file with corrupted data somewhere in the middle. Assuming you wanted to scroll that far through a hexdump of it, there could be pretty much any data in there without being suspicious.


What should I look for? The evil bit set?


Sure, the same person who's gonna be looking is the same person who'd click "show diff"


00011900: 0000 4883 f804 7416 b85f 5f5f 5f33 010f ..H...t..____3.. │ 00011910: b651 0483 f25a 09c2 0f84 5903 0000 488d .Q...Z....Y...H. │ 00011920: 7c24 40e8 5875 0000 488b 4c24 4848 3b4c |$@.Xu..H.L$HH;L │ 00011930: 2440 7516 4885 c074 114d 85ff 0f84 3202 $@u.H..t.M....2. │ 00011940: 0000 498b 0ee9 2c02 0000 b9fe ffff ff45 ..I...,........E │ 00011950: 31f6 4885 db74 0289 0b48 8bbc 2470 1300 1.H..t...H..$p.. │ 00011960: 0048 85ff 0f85 c200 0000 0f57 c00f 2945 .H.........W..)E │ 00011970: 0048 89ac 2470 1300 0048 8bbc 2410 0300 .H..$p...H..$... │ 00011980: 0048 8d84 2428 0300 0048 39c7 7405 e8ad .H..$(...H9.t... │ 00011990: e6ff ff48 8bbc 24d8 0200 0048 8d84 24f0 ...H..$....H..$. │ 000119a0: 0200 0048 39c7 7405 e893 e6ff ff48 8bbc ...H9.t......H.. │ 000119b0: 2480 0200 0048 8d84 2498 0200 0048 39c7 $....H..$....H9. │ 000119c0: 7405 e879 e6ff ff48 8bbc 2468 0100 004c t..y...H..$h...L │ Please tell me what this code does, Sheldon


You're right - the two exploit files are lzma-compressed and then deliberately corrupted using `tr`, so a hex dump wouldn't show anything immediately suspicious to a reviewer.

Mea culpa!


Is this lzma compressed? Hard to tell because of the lack of formatting, but this looks like amd64 shellcode to me.

But that's not really important to the point - I'm not looking at a diff of every committed favicon.ico or ttf font or a binary test file to make sure it doesn't contain a shellcode.


it's just an arbitrary section of libcxx-18.1.1/lib/libc++abi.so.1.0


testdata should not be on the same machine as the build is done. testdata (and tests generally) aren't as well audited, and therefore shouldn't be allowed to leak into the finished product.

Sure - you want to test stuff, but that can be done with a special "test build" in it's own VM.


In the Bazel build system, you would mark the test data blob as testonly=1. Then the build system guarantees that the blob can only be used in tests.

This incident shows that killing the autoconf goop is long overdue.


That could easily double build cost. Most open-source package repositories are not exactly in a position to splurge on their build infra.


in this case the backdoor was hidden in a nesting doll of compressed data manipulated with head/tail and tr, even replacing byte ranges inbetween. Would've been impossible to find if you were just looking at the test fixtures.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: