Ok, so this article is at least contributing to the good side of the discourse a...

HybridCurve · on Feb 2, 2023

Thank you for taking the time to explain this, I've seen numerous threads where people complain about C but lack enough understanding of it to know where to properly attribute the blame. C shares one common trait with assembly: the ability to access, interpret, and modify memory freely. It is also an intensely manual language and expects the programmer to be both discriminating and thorough when dealing with unexpected values. Beyond that, they should not be compared. The language is not forgiving and as such has earned resentment from programmers which have had the benefit of using other languages that lessen the burden on the developer. Truthfully, all programming languages have an area where they excel (even ones we don't enjoy) and often times the approach or requirements of a project determine the language which should used. Many complain about it generally as being inadequate but fail to provide the context of which the language is employed, in which case it would be obvious that they should use another language. I would also like to add that scale (as in LoC or project complexity) is an important factor to consider when selecting an appropriate language. C can be much easier to manage for smaller executables/libraries or projects that don't have many layers of abstraction. Simultaneously, modern software and application complexity has grown significantly since C's inception and it is not commonly the most ideal solution. Discussions involving languages are enjoyable here when there is deliberation but I loathe when they devolve into tribalistic posturing and whinging.

zokier · on Feb 2, 2023

> C shares one common trait with assembly: the ability to access, interpret, and modify memory freely

In reality that is a trap; C makes you think that you might be able to poke memory in all sorts of ways but in reality there are lot of subtle restrictions around memory access. The whole discussion around pointer provence is the tip of the iceberg here.

And that is pretty much at the core of the whole ub hullabaloo; difference between what C seems to be (or have been) and what the standard says.

pjmlp · on Feb 2, 2023

To see how cross platform C looked like in those early days, nothing like reaching out to books like "A book on C" from 1984 (Robert Edward Berry and B. A. E. Meekings), which has an implementation of RatC [0].

https://www.amazon.de/-/en/Robert-Berry/dp/0333368215/ref=sr...

Then there were Small-C and BDS C, as yet two other major subsets in those early days.

[0] - Similar in ideas to Ratfor but applied to a K&R C subset

kragen · on Feb 2, 2023

worth noting that bds c, which generates 8080 code, is now free software

uecker · on Feb 2, 2023

I fully agree with this. For example that assigning a freed pointer in C is UB is not because of optimization, but because there were real world architectures with memory segmentation were loading such a pointer caused a run-time trap (e.g. 286 protected mode). Or that reading an uninitialized automatic variable is UB is because it could cause a trap on architectures which could detect this (e.g. IA-64). There were also C versions with bounds checking etc. That compilers which are popular today focus on exploiting UB for optimization instead of security is an implementation choice, but not a fundamental problem of the language itself.

disruptiveink · on Feb 2, 2023

I like this description. It's a useful mental model.

But in summary, doesn't that just move the target from "The compiler is stupid, it shouldn't be doing this, it clearly should know what I mean and I didn't mean that!" to "This is a bad minimum common denominator, if any architecture really needs this guarantee or for things to behave like this, then they should pay a performance penalty. We shouldn't all have to pay the portability price for this one thing that isn't a issue anywhere."

And to be honest, most of the UB hate I see is about the latter, not the former, no?

kllrnohj · on Feb 2, 2023

Most of the UB hate is that bugs that always existed only recently became exposed. "This code worked fine for years why is the compiler breaking it!" is always the rant, but it's misplaced. It should instead be "why didn't I get a sanitizer/linters/debug-whatever error first?"

The proliferation of optimization passes outpaced decent debuggability, and that's really the problem. Rants against UB are nearly always irrelevant or even just outright wrong. And worse still, those crusaders are harmful. You can see this in Rust as a perfect example. Signed integer overflow is defined two's compliment, much rejoicing from the "UB always bad!" crowd. Except wait a minute, in a debug build of Rust it's defined to be a panic. Why? Because signed integer overflow is 99.999% of the time a bug, and defining how it overflows doesn't actually help anyone. So instead you're left with the worst of both worlds - you both can't rely on how signed ints behave in Rust as a programmer because they have 2 extremely incompatible defined behaviors, and the optimizer/runtime then can't take advantage of them being undefined behavior in practice in release builds to optimize better.

pjmlp · on Feb 2, 2023

It boils down to a culture problem, while communities in safer systems programming languages embrace having a panic on signed integer overflow, in the C languages world suggesting the use of -ftrapv (or similar) will make them reach out for the pitchforks.

The linters and compiler security flags are there, the problem is getting them adopted.

nullc · on Feb 2, 2023

Of course, a panic is also a failure. If may be a less serious one, or it may just make your rocket explode on take-off and kill anything down-range while the result of the incorrect computation would otherwise have been irrelevant.

One of the difficulties I've had with the 'safer systems programming languages' advocacy is that since something going wrong is inherent and unavoidable -- since the flaw is ultimately in the user's code -- there is a tendency to pretend that the panic isn't something going wrong. In my experience this has result in measurably lower quality code from these communities, code which panics in slightly unexpected conditions -- while something written in C would not (yet may fail in a worse way when it does fail).

I don't think I've yet managed to download and run anything written in rust where it doesn't panic within the first 15 minutes of usage-- except the rust compiler itself and firefox (though I do now frequently get firefox crashes that are rust panics).

It may well be that the increased runtime sensitivity to programmer errors in these languages inherently mean we should expect more runtime failures as previously benign mistakes are exposed, and ought to accept that software written in these languages may be less reliable on aggregate because when it does fail its less likely to create security problems and that this is a worthwhile tradeoff. (Python users sure seem to survive a near constant rate of surprising runtime failures...)

But to the extent and so long as language advocates pretend that panics aren't failures they can't really advocate for the trade-off, advance better static analysis to reduce the gap, and will continue to seem fundamentally dishonest to people who try to use the languages and software written in them and experience the frequent panics first hand.

imtringued · on Feb 2, 2023

The difference between Rust and Java is that Rust developers decided that panics shouldn't be recoverable except as an afterthought for C compatibility.

In principle most panics are amendable to retrying the operation which would be the equivalent of catching exceptions in Java. So yes you get an "error has occurred" warning but your program doesn't terminate immediately. I don't think C has an edge over Java here.

nullc · on Feb 2, 2023

How common is it for java code to handle exceptions in useful ways rather than just fail in even more inexplicable ways due to no one ever having conceived of much less tested those code paths being executed?

pjmlp · on Feb 3, 2023

A panic makes an error situation visible, the way of C can lead to unnoticeable error situation go for longer than expected corrupting data in more inrecovereable ways than just crashing right there on the spot.

A bit like having warnings as errors, or deciding to ignore warnings as peril to what might come later without the feedback of what those warnigns were all about.

nullc · on Feb 4, 2023

Yes, but visible at runtime. Depending on the situation you may well prefer* the silent failure. Many such silent failures are completely benign, e.g. the result of the wrong code (or whatever it corrupted) wasn't subsequently used.

*would prefer if you actually got to pick. But you don't get to pick because once you know of the bug you fix it either way.

Warnings as errors isn't a great example, because if you do it in code distributed to third parties its an absolute disaster as the warnings are not stable and there are constantly shifting false positives. It's perhaps not a good example even without distributing it, because it can lead to hasty "make it compile" 'fixes' that can introduce serious (and inherently warning undetectable) bugs. It's arguably better to have warnings warn until you have the time to look at them and handle them seriously, so long as they don't get missed.

The parallel doesn't carry through to undefined behavior because the undefined behavior isn't logging a warning that you could check out later (e.g. before cutting a release).

tialaramex · on Feb 2, 2023

However, culture results in artefacts. You mostly won't find American Football Stadiums in England's cities, because it's not part of their culture. If the English suddenly took to this game, such stadiums likely would take as much as several decades to become widespread.

C libraries like OpenSSL reflect what's culturally appropriate in that language, so even if you came to C from a language with a different culture, too bad it has the culturally appropriate API design and behaviour.

nullc · on Feb 2, 2023

I think that OpenSSL has historically reflected a rather antiquated C culture that most software moved on from long ago, FWIW.

A clear example of this is OpenSSL intentionally mixing uninitialized memory into its randomness pool (because on some obscure and long forgotten platforms it was the only way they had to get any 'randomness'), resulting in any programs written using it absolutely spewing valgrind errors all over the place. (Unless your openssl has been compiled with -DPURIFY to skip that behavior, or had the debian "fix" of bypassing the rng almost completely :P ).

tialaramex · on Feb 2, 2023

I think the OpenSSL situation you're talking about arises because of a mistake by a maintainer.

MD_Update(&m,buf,j);

Kurt Roeckx found this line twice in OpenSSL. Valgrind moaned about this code and Kurt proposed removing it. Nobody objected, so in Debian Kurt removed the two lines.

One of these occasions is, as you described, mixing uninitialized (in practice likely zero) bytes into a pool of other data and removing it does indeed silence the Valgrind error and fixes the problem. The other, however is actually how real random numbers get fed into OpenSSL's "entropy pool", by removing it there is no entropy and the result was the "Debian keys" - predictable keys "randomly" generated by affected OpenSSL builds.

I haven't seen OpenSSL people claim that the first, erroneous, call was somehow supposed to make OpenSSL produce random bits on some hypothetical platform where the contents of uninitialised memory doesn't start as zero, it looks more like ordinary C programmer laziness to me.

nullc · on Feb 2, 2023

The odd thing with that incident is that the "PURIFY" define long predated it-- the correct fix in debian should have been "Just compile with DPURIFY"-- I believe redhat was already doing so at the time.

> I haven't seen OpenSSL people claim that the first, erroneous, call was somehow supposed to make OpenSSL produce random bits on some hypothetical platform where the contents of uninitialised memory doesn't start as zero

I had an openssl dev explain (in person) to to me when I complained about the default behavior: that there had been platforms that depended on that behavior, that they weren't sure that which ones did, and so it didn't seem safe to eliminate it. (I'd complained because I couldn't have users with non -DPURIFY openssl code run valgrind as part of troubleshooting). IIRC the use of uninitialized memory was intentional and remarked on in comments in the code.

zzo38computer · on Feb 2, 2023

They should make such considerations as:

- If the "uninitialized" data is actually somehow some kind of interference.

- In LLVM, using a "undef" value will not always do the same thing each time; however, the "freeze" command can be used to avoid that problem. (I don't know if this feature of LLVM can be accessed from C codes, or how the similar things are working in GCC.)

- If the code seems unusual, then you should write comments to explain why it is written in the way that it is. (You can then also know what considerations to make if you want to remove it.)

- Whether or not there is uninitialized data, you will need to make proper entropy too, from other properly entropy data.

imtringued · on Feb 2, 2023

>So instead you're left with the worst of both worlds - you both can't rely on how signed ints behave in Rust as a programmer because they have 2 extremely incompatible defined behaviors, and the optimizer/runtime then can't take advantage of them being undefined behavior in practice in release builds to optimize better.

I don't understand how this is the worst of both worlds.

You can explicitly define overflow behavior in Rust. There are wrapper types and explicit checked or saturating and wrapping operations if those are necessary for the correctness of your program. If your program doesn't rely on them then checked overflow being the default in debug builds is the way to go and given enough confidence in the final product it makes sense to drop them in release builds and given enough processor advancements we can also do checked overflow in release builds.

kllrnohj · on Feb 2, 2023

I can do that in C/C++ where regular signed integer overflow is otherwise undefined behavior. The point is just Rust defining the behavior (no UB!) didn't do a damn thing to help anyone since if you actually want and expect overflow you need to use specific functions/wrappers to do that anyway. You also probably want the carry flag anyway so having regular addition be "defined behavior" is still useless.

saagarjha · on Feb 4, 2023

Yep, exactly this. There's a handful of undefined behavior that might actually be worth reconsidering, but almost all UB that people want turned into defined behavior are "yeah we had a bug let's make it do something about as bad but call it defined".