Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>in Rust and C++, the standard library sorting functions are templated on the comparison function. This means it's much easier for the compiler to specialize the sorting function, inline the comparisons and optimize around it.

I think this is something of a myth. Typically, a C compiler can't inline the comparison function passed to qsort because libc is dynamically linked (so the code for qsort isn't available). But if you statically link libc and have LTO, or if you just paste the implementation of qsort into your module, then a compiler can inline qsort's comparison function just as easily as a C++ compiler can inline the comparator passed to std::sort. As for type-specific optimizations, these can generally be done just as well for a (void *) that's been cast to a T as they can be for a T (though you do miss out on the possibility of passing by value).

That said, I think there is an indirect connection between a templated sort function and the ability to inline: it forces a compiler/linker architecture where the source code of the sort function is available to the compiler when it's generating code for calls to that function.



qsort is obviously just an example, this situation applies to anything that takes a callback: in C++/Rust, that's almost always generic and the compiler will monomorphize the function and optimize around it, and in C it's almost always a function pointer and a userData argument for state passed on the stack. (and, of course, it applies not just to callbacks, but more broadly to anything templated).

I'm actually very curious about how good C compilers are at specializing situations like this, I don't actually know. In the vast majority cases, the C compiler will not have access to the code (either because of dynamic linking like in this example, or because the definition is in another translation unit), but what if it does? Either with static linking and LTO, or because the function is marked "inline" in a header? Will C compilers specialize as aggressively as Rust and C++ are forced to do?

If anyone has any resources that have looked into this, I would be curious to hear about it.


If you choose to put a boundary in your code that makes it span over several binaries, so that they can be swapped out at runtime, no compiler in any language can optimize that away, because that would be against the interface you explicitly chose. That's what dynamic linking aka. runtime linking is in C.

This is not an issue for libc, because the behaviour of that is not specified by the code itself, but by the spec, which is why C compilers can and do completely remove or change calls to libc, much to the distress of someone expecting a portable assembler.


Dynamic linking will inhibit inlining entirely, and so yes qsort does not in practice get inlined if libc is dynamically linked. However, compilers can inline definitions across translation units without much of any issue if whole program optimization is enabled.

The use of function pointers doesn't have much of an impact on inlining. If the argument supplied as a parameter is known at compile time then the compiler has no issue performing the direct substitution whether it's a function pointer or otherwise.


My point is that the real issue is just whether or not the function call is compiled as part of the same unit as the function. If it is, then, certainly, modern C compilers can inline functions called via function pointers. The inlining itself is not made easier via the template magic.

Your C comparator function is already “monomirphized” - it’s just not type safe.


Wouldn't C++ and Rust eventually call down into those same libc functions?

I guess for your example, qsort() it is optional, and you can chose another implementation of that. Though I tend to find that both standard libraries tend to just delegate those lowest level calls to the posix API.


Rust doesn't call into libc for sort, it has its own implementation in the standard library.


Obviously. How about more complex things like multi-threading APIs though? Can the Rust compiler determine that the subject program doesn't need TLS and produce a binary that doesn't set it up at all, for example?


Optimising out TLS isn't going to be a good example of compiler capability. Whether another thread exists is a global property of a process, and beyond that the system that process operates in.

The compiler isn't going to know for instance that an LD_PRELOAD variable won't be set that would create a thread.


> Whether another thread exists is a global property of a process, and beyond that the system that process operates in.

TLS is a language feature. Whether another thread exists doesn't mean it has to use the same facilities as the main program.

> The compiler isn't going to know for instance that an LD_PRELOAD variable won't be set that would create a thread.

Say the program is not dynamically linked. Still no?


> Say the program is not dynamically linked. Still no?

Whether the program has dynamic dependencies does not dictate whether a thread can be created, that's a property of the OS. Windows has CreateRemoteThread, and I'd be shocked if similar capabilities didn't exist elsewhere.

If I mark something as thread-local, I want it to be thread-local.


I mean, it’s not that obvious, your parent asked about it directly, and you could easily imagine calling it libc for this.

I beehive the answer to your question is “yes” because no-std binaries can be mere bytes in size, but I suspect that more complex programs will almost always have some dependency somewhere (possibly even the standard library, but I don’t know offhand) that uses TLS somewhere in it.


Many of the libc functions are bad apis with traditionally bad implementations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: