Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I wouldn't be surprised if "parse and compile 8MB of integer literals" isn't a very well optimized code path in the compiler, because nobody is parsing and compiling 8MB of integer literals outside of artificial exercises like this

Actually, no. There is no portable "incbin" in C/C++, so one way to bake assets into the .rodata section is to convert a binary file into a huge "const unsigned char data[8192567] = { 0xFF, 0xD8, 0xFF, 0xE0, ... }" string and stash it into a .h or .c file. And yes, people do that often enough that gcc and clang actually had to optimize for this case specifically.



C23 gets #embed https://thephd.dev/finally-embed-in-c23 and in practice later C++ compilers will presumably also just do #embed rather than pretend they aren't the same code anyway.

However, even though there's a clear need, as that post explains, the compilers varied between "Bad at this" and "Completely awful at this" which is part of what was so frustrating for JeanHyde over years of getting this through committee.

In principle #embed is just shoving the bytes in as comma separated values, but in practice by the "as if" rule in C and C++ the compiler won't do that because it's stupid.


Since Rust 1.67.0, the compiler has a specific optimized code path for the (already older) include_bytes! macro: https://github.com/rust-lang/rust/pull/103812#issuecomment-1...


Ha, cute, I think I began examining details of include_bytes! only this year so it's possible I happened to first look at this soon after the big improvements landed in the release I was testing against.


Also, a reminder that JS engines do regularly have to contend with enormous JSON and do not generally display pathological behavior for large literals.

I'm not sure who posted that comment, but if they are on the Swift team and blaming users, that should stop. Code like that is out there, and it's the job of tooling to handle it gracefully.


> JS engines do regularly have to contend with enormous JSON and do not generally display pathological behavior for large literals

Actually it’s common advice to replace very large literals with `JSON.parse(“…”)` because it’s faster according to Chrome engineers [1]. At Notion we did so for our emoji unicode tables for a noticeable time-to-interactive improvement over the large literal.

[1]: https://youtu.be/ff4fgQxPaO0


Right, but that's a linear speedup (because the JSON parser is less complex than the JS parser), not a change from superlinear to linear parse time.


It wasn’t a Swift team member, Swift community has a quick response for these cases because it’s been around forever and is unique enough of a “simple” case that it, ex. ends up at #1 on HN.

My understanding of Swift’s teams thoughts, though a few years out of date, is it’s very very hard to get the Swift type inference engine to handle a huge blob, without type hinting, or with literals that could be multiple types.


Three years ago I tried putting constants for a matrix (I think it was 1024x80 or sth like that) into a strongly typed array of floats. Still took the compiler 30mins to compile it. In the end I split the matrix up in rows, with each row being one variable, and a final variable to concatenate the rows. No typing hint whatsoever helped.

Big wtf moment.


Yeah this one's real strange to me. I feel like it used to be as simple as the [int] definition people suggest in the thread.

I don't miss Xcode too much. IIRC I built some custom script using arcanda dropped by Apple employees to output per-function build times. It was great! I knew exactly what functions were slow. But now I am old and lazy lol.


Note that someone who seems to manage a Swift team at Apple left a more encouraging response later: <https://forums.swift.org/t/why-is-swift-so-slow-timeout-in-c...>


Right, but unlike Swift, the type system in JS is not doing computationally expensive type checking for each element. This is the equivalent of creating 1M DOM trees in JS and expecting it to be responsive.

Rather than trying to force Swift to create fixed dimension arrays, the uncomfortable part about Swift for control freaks is that you should just make the array dynamic, and trust that the compiler is smart.

It's not blaming the user, but the compiler team has to balance functionality ("Why can't the type system give me a better error?") and speed. There's no way they can handle every pathological edge case gracefully.


> computationally expensive type checking

So, an hour's worth of compilation time = 3600 seconds, divided by a million elements, that's 3.6 ms, or about 10 million instructions per element. Just what in the heck is this compiler doing with 10 million instructions on an integer literal? Like, it's never seen integer literals before? And they're all integers. Every element. It's not a "pathological edge case". The whole thread is absurd. People are speaking up and blaming programmers, the language, the type system, when this is so obviously the compiler's fault.


We have absurdly fast hardware these days.

I can barely believe that a single thread, copy pasted quicksort can sort 1M elements in less than 50ms on my computer.

Fast compilers such as TCC should be frequently used as a reality check for compiler teams. Of course the constraints are not the same but still...


I have no idea what's going on under the hood, but I'm sure it's not a linear time process, and all it takes is one of those elements to be "109.1" to flip 1M elements from Ints to Doubles. Yes, they could special-case this (and arrays of Doubles, and Strings), but where do you draw the line? That's now extra code to maintain in the compiler just to support a behavior that should be discouraged. If you need to read in 1M integers, put them in a file and read them in.


> should be discouraged. If you need to read in 1M integers, put them in a file and read them in.

This is exactly the attitude that needs to be addressed. It's hostile to argue with users who are doing something completely legal and tell them they shouldn't do that, they should do it another way, that that behavior needs to be discouraged, etc.

We went through this all the time with V8, trying to tell JS developers they just shouldn't do that because V8 had couldn't run that code fast. Or worse, that it had a particular pathology that made certain code absurdly slow. It just doesn't fly. It's V8's job to not go off the rails for user inputs; it should provide good default performance all the time, not get stuck in deopt loops, use absurd amounts of memory, etc. Yeah, and that's hard work.


I hear you, ideally the compiler should be able to manage any arbitrary input in a reasonable time, and catch itself if necessary. Usually it does—sometimes with complex ungrouped arithmetic operations with ambiguous types, Swift will error out and tell you to break up your expression, rather than get stuck.

I would love for the Swift compiler to be dramatically faster, but I understand the challenge, with a powerful type inference engine that supports Generics. It's a resource scarcity problem. If the Swift team spends 10 hours to handle long strings of integer literals, that's 10 hours they haven't put toward features that would benefit a larger audience.


Your take sounds more reasonable. I would think you would get diminishing returns if you try to solve all pathological cases.

Sometimes it is the fault of the language design too. Maybe the language spec needs to be changed. Imagine all the wasted effort to optimize V8 when you could have put static typing into Javascript itself.


Indeed—Chris Lattner has mentioned recently that in retrospect, he regrets certain Swift design decisions which have made the compiler so complex and relatively slow.


> If you need to read in 1M integers, put them in a file and read them in.

..and if you're in a context where you need that data in the binary, because you don't have a reliable place to store a file?

> That's now extra code to maintain in the compiler just to support a behavior that should be discouraged.

If you don't want to maintain the code to properly support a language feature, don't include that feature in the language.


For a compiled language it seems reasonable to assume that "put them in a file and read them in" means at compile time, like the C23 pre-processor feature #embed and the Rust macro include_bytes!

Now, #embed and include_bytes! always give you bytes (in Rust these are definitely u8, an unsigned 8-bit integer, I don't know what is promised in C but in practice I expect you get the same) because that's what modern files are - whereas maybe you want, say, 32-bit big endian signed integers in this Swift example. But, Swift is a higher level language, it's OK if it has some more nuance here and maybe pays a small performance penalty for it. 10% slower wouldn't be objectionable if we can specify the type of data for example.


> But, Swift is a higher level language, it's OK if it has some more nuance here and maybe pays a small performance penalty for it.

I thought the concern was about compile time performance? I'm not sure what Swift being higher-level has to do with that.


The comments on that Swift forum seem to indicate that the slowness is not caused by the type checker though.


Actually the xbm image format is c files like this with a long array of hexadecimal numbers.

https://en.m.wikipedia.org/wiki/X_BitMap


I remember paint programs that could export images as C/Pascal/whatever source code files.


Gimp still does that, it's rather useful if you want to play around with images without a loader.


I once had good reason (or at least I thought so!) to code-generate nested structures into .rodata by way of thousands of static arrays (each with its own variable), referenced by other static arrays, referenced by static structs…

The application was a highly optimized encoding of Aho-Corasick state machines for megabytes of static dictionary strings. The code generation (not in C) was trivial, and along with .rodata came all the benefits of shared pages and lazy loading.

Across a number of compilers I only ran into one bug in 32-bit gcc, which was worked around easily by disabling the (unhelpful) optimization pass that was getting snarled.


> { 0xFF, 0xD8, 0xFF, 0xE0, ... }

You're right of course but experienced developers know to write those as strings instead of arrays, especially for very large content. Otherwise both the compiler and IDE become painfully slow. ie your example would be written "\xFF\xD8\xFF\xE0..."


Don't try that in MSVC. It deliberately chokes on large strings. If you work hard you might put a few thousand integers in a string, but this example has a million integers. Most ways you can attempt that in MSVC time out, abort compilation or emit an error.


For an example of C using this in real life open a .xpm file in a text editor. There are many implementations of a C tool that converts a binary file into a .h.

And yes, it is dumb that we have to do tricks like this in 2023.


finally. #embed

https://thephd.dev/finally-embed-in-c23

tl;dr It was such a big issue they actually managed to get it through the C Committee


> Actually, no. There is no portable "incbin" in C/C++

I kinda hate this kind of argument. I mean, it's true, as .incbin is a GNU assembler directive (FWIW: binutils/llvm objcopy is a better mechanism still for this sort of thing in most contexts, as it doesn't involve source compilation of any kind).

But it leads ridiculous design decisions, like "I'm going to write 8MB of source code instead of doing the portability work on my own to turn that data into a linkable symbol".

There's a huge gulf in the space between "non-portable" and "impossible". In this case, the problem is trivially solved on every platform ever. Trivial problems should employ trivial solutions, even where they have to involve some per-platform engineering.


But it leads ridiculous design decisions, like "I'm going to write 8MB of source code instead of doing the portability work on my own to turn that data into a linkable symbol".

Why is that ridiculous? It strikes me as not necessarily the best, but the most obvious approach, the most portable, and possibly the quickest to implement.

Your toolchain might have a special way to import binary blobs, but a) you’ll have to dig through the docs to find it, b) you’ll probably need to solve the problem again when porting to a different platform, and c) who knows if it actually works, or if there are hidden gotchas?

Sure, if there’s a known tool or option that does the job, you should go ahead and use it. But in general, writing a little script to generate a bunch of boilerplate code is perfectly workable.


> Your toolchain might have a special way to import binary blobs, but a) you’ll have to dig through the docs to find it

This is a corrollary: "I don't want to learn my tools, so I'll learn the language standard instead" is fundamentally exactly the problem I'm talking about.

Straight up: C linkage is a 1970's paradigm full of tools that had to run on a PDP/11, and it's vastly simpler than learning C++ or Rust. It's just not "modern" and no one taught it to you, so it looks weird and mysterious. That's the problem!


I go back and forth on this argument when it comes to codegen. Like, you could make the same argument that protobuf shouldn't output C code. It should output an object file that you can link into whatever compiled language you want. C, fortran, C++, rust, who cares. As you say, the linking model is simple and works well.

Why do we generate big C/C++ strings instead, and compile those? Because object files have lots of compiler/platform/architecture specific stuff in them. Outputting C (or C++) then compiling it is a much more convenient way to generate those object files, because it works on every OS, compiler and architecture. Even systems that you don't know about, or that don't exist yet.

I hear what you're saying and I'm torn about it. I mean, aren't binary blobs just a simpler version of the problem protobuf faces? C code is already the most portable way to make object files. Why wouldn't we use it?


> Like, you could make the same argument that protobuf shouldn't output C code.

The case at hand is an 8MB static array of integers. Obviously yes, of course, absolutely: you choose the correct/simple/obvious/trivialest implementation strategy for the problem. That's exactly what I'm saying!

In the case of protobuf (static generation of an otherwise arbitrarily complicated data structure with reasonably bounded size), code generation makes a ton of sense.


And the simple/obvious/trivialest solution is to write an array literal with 4 millions integers in it, while fighting with Microsoft's link.exe is anything but. Even using rc.exe and loading that data from the resource section is a non-trivial amount of additional work.


Come on. Both binutils and nasm can generate perfectly working PE object files. I don't know the answer off the top of my head, but I bet anything even pure MSVC has a simple answer here. Dealing with the nonsense in the linked article is something you do for a quick hack or to test compiler performance, but (as demonstrated!) it scales poorly. It's terrible engineering, period. Use the right tools, even if they aren't ISO-specified. And if you can't or won't, please don't get into fights on the internet justifying the resulting hackery.


> I bet anything even pure MSVC has a simple answer here

You lose your bet, because it doesn't, neither its inline assembler nor actual MASM shipped with Visual Studio support any "incbin"-like directives. I guess you can generate an .asm with a huge db/dd, I guess, if you don't like a large literal array in .c files, but that's it.


> Come on. Both binutils and nasm can generate perfectly working PE object files. I don't know the answer off the top of my head, but I bet anything even pure MSVC has a simple answer here.

Right, meaning you have to implement N solutions instead of just one. It's a common enough and useful enough feature for the language to support it. I think it would be a different story if linkers were covered by the language specification.


I remain shocked at how controversial this is. Yes. Yes, implementing N trivial and easily maintained solutions is clearly better than one portable hack.


Clearly many people disagree. And given that C now has #embed, I don't even think I'd consider it to be a hack.

> I remain shocked at how controversial this is.

I am a bit shocked that you think the right solution to making data statically available to the rest of your program is somehow outside the scope of the programming language.


The comparison wasn't to #embed[1], but to an 8MB static array. You're winning an argument against a strawman, not me. For the record, I think #embed (given tooling that supports it) would be an excellent choice! That's not a defense of the technique under discussion though.

[1] Which FWIW is much less portable as an issue of practical engineering than assembler or binutils tooling!


It's not "I don't want to learn my tools."

It's "I don't want to learn and debug _everybody else who may possible want to build this otherwise portable C program_'s tools."

When those tools change how they do this unportable thing every couple of years in subtle and incompatible ways, which require #ifdef's to handle the different ways those linked against symbols can be accessed, multiplied by dozens of different platforms, then yes, I'm going to compile an 8MB literal.


I am pretty certain I've seen linkers routinely writing 0 instead of symbols' actual sizes so getting the actual size of the embedded binary blob is not very pretty.

Also, you argument sounds exactly like those that the author of https://thephd.dev/finally-embed-in-c23 has been fighting against for 5 years straight.


Yes! That person is a hero.


It's just not "modern" and no one taught it to you, so it looks weird and mysterious.

I don’t know what to say except that I’ve worked with C linkers for a long time, since before I learned C++ and before Rust even existed, and I still don’t like ‘em.


> But it leads ridiculous design decisions, like "I'm going to write 8MB of source code instead of doing the portability work on my own to turn that data into a linkable symbol".

people use scripts to do this kind of thing and never think about it again, and it's still more portable than writing a custom build step.


> FWIW: binutils/llvm objcopy is a better mechanism still for this sort of thing in most contexts, as it doesn't involve source compilation of any kind

I used to agree with that, but honestly a simple `xxd` to get a C array avoids so many issues with `objcopy` that I'd rather just use that now. With `objcopy` even just getting the names of the produced symbols to be consistent is a pain, and you have to specify the output target and architecture which is just another thing you have to update for more platforms (and if someone's using a cross-compiler, they have to override that setting too).

In contrast if you just produce a C array then it compiles like normal C code and links like normal C code, problem solved and all the complexity is gone.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: