> [undefined behavior for optimization] Yes, I am aware that that is the excuse....

saagarjha · on April 24, 2018

> if you think "You have violated something, for which we will give you no diagnostic, and therefore we feel free to crash you at some random other place in the program that has nothing to do with the place where the alleged violation took place, again with no diagnostics" is reasonable

No, I generally don't, which is why I use a safer language like Swift most of the time (well, that, and the fact that I can take advantage of a nicer standard library). You put "safer" in quotes, because I don't think you quite understand how compiler optimizations are supposed to work. I think Chris Lattner's three part series, "What Every C Programmer Should Know About Undefined Behavior"[1], is a great explanation from a compiler writer for why dangerous optimizations have to exist. It certainly helped me when I was in a similar place as you, not quite understanding why the optimizer did seemingly stupid things.

Really, the crux of the issue is that every language has tradeoffs: you can program in assembly and know exactly what your program is doing, but you lose the portability and convenience of higher level languages. Then you have the C family of languages, where you get access to some higher level concepts at the cost of ceding control to a compiler. The compiler's job is to generate assembly that matches what you are trying to do in the most efficient way possible. Of course, if it did so too literally it would be very slow to account for every single "stupid" thing you could have done, so there are some general rules that are imposed that you must follow in order for the compiler to do what you want. Then, of course, we have the high-level languages which do account for every stupid thing you might do, and so can provide proper diagnostics.

[1] http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

mpweiher · on April 24, 2018

I have programmed in C since ~1986, so please don't try to explain the language to me, and don't assume that my POV comes from a place of ignorance.

The craziness with undefined behavior is a fairly recent phenomenon. In fact, I started programming in C before there even was a standard, so all behavior was "undefined", yet no compiler manufacturer would have dreamed of taking the liberties that are taken today.

Because they had paying customers.

The actual benefits of the optimizations enabled are fairly minimal, and the cost is insane, with effectively every C program in existence suddenly sprouting crazy behavior, behavior that used to not be there.

Yeah, and while I know Chris personally, like and respect him, I am not taking his word for it.

> The compiler's job is to generate assembly that matches what you are trying to do in the most efficient way possible

Exactly: "matches what you are trying to do". The #1 cardinal rule of optimization is to not alter behavior. That rule has been shattered to little pieces that have now been ground to fine powder.

Sad times.

See: Proebsting's law, "The Death of Optimizing Compilers" and "What every compiler writer should know about programmers or “Optimization” based on undefined behaviour hurts performance"

saagarjha · on April 24, 2018

> I have programmed in C since ~1986, so please don't try to explain the language to me, and don't assume that my POV comes from a place of ignorance.

I apologize for my tone, it was more patronizing that I had intended it to be.

> The craziness with undefined behavior is a fairly recent phenomenon. In fact, I started programming in C before there even was a standard, so all behavior was "undefined", yet no compiler manufacturer would have dreamed of taking the liberties that are taken today.

I feel that the current renewed focus on optimizing compilers has really been born out of the general slowing of Moore's law and stagnation in hardware advances in general, as well as improvements in program analysis taken from other languages. Just my personal guess as to why.

> Exactly: "matches what you are trying to do". The #1 cardinal rule of optimization is to not alter behavior. That rule has been shattered to little pieces that have now been ground to fine powder.

The optimizing compiler has a different opinion than you do of "altering behavior". If you're looking for something that follows what you're doing exactly, write assembly. That's the only way you can guarantee that the code you have is what's being executed. A similar, but not perfect solution is compiling C at -O0, which matches the behavior of older compilers: generate assembly that looks basically like the C code that I wrote, and perform little to no analysis on it. Finally, we have the optimization levels, where the difference is that you are telling the compiler to make your code fast; however, in return, you promise to follow the rules. And if you hold up your side of the bargain, the compiler will hold up its own: make fast code that doesn't alter your program's visible behavior.

mpweiher · on April 25, 2018

> The optimizing compiler has a different opinion than you do of "altering behavior".

Obviously. And let's be clear: the optimizing compilers of today. This rule used to be inviolable, now it's just something to be scoffed at, see:

> If you're looking for something that follows what you're doing exactly, write assembly.

Er, no. Compilers used to be able to do this, with optimizations enabled. That this is no longer the case is a regression. And shifting the blame for this regression to the programmers is victim blaming, aka "you're holding it wrong". And massively counter-productive and downright dangerous. We've had at least one prominent security failure due to the compiler removing a safety check, in code that used to work.

> Finally, we have the optimization levels, where the difference is that you are telling the compiler to make your code fast;

Hey, sure, let's have those levels. But let's clearly distinguish them from normal operations: cc -Osmartass [1]

[1] http://blog.metaobject.com/2014/04/cc-osmartass.html

saagarjha · on April 25, 2018

The article you linked to in your blog post is most likely not serious; it's a tongue-in-cheek parody of optimizing compilers, though one that's written in a way that brings it awfully close to invoking Poe's Law.

But back to the main point: either you can have optimizations, or you can have code that "does what you want", but you can't have both. OK, I lied, you can have a very small compromise where you do simple things like constant folding and keep with the intent of the programmer, and that's O0. That's what you want. But if you want anything more, even simple things like loop vectorization, you'll need to give up this control.

Really, can you blame the compiler? If you had a conditional that had a branch that was provably false, wouldn't you want the compiler to optimize it out? Should the compiler emit code for

  if (false) {
  	// do something
  }

In the security issue you mentioned, that's basically what the compiler's doing: removing a branch that it knows never occurs.

mpweiher · on April 25, 2018

> either you can have optimizations,

> or you can have code that "does what you want",

> but you can't have both.

This is simply not true. And it were horrible if it were true. "Code that does what I want" (or more precisely: what I tell it to) is the very basic requirement of a programming language. If you can't do that, it doesn't matter what else you can do. Go home until you can fulfill the basic requirement.

> very small compromise

This is also not true. The vast majority of the performance gains from optimizations come from fairly simple things, but these are not -O0. After that you run into diminishing returns very quickly. I realize that this sucks for compiler research (which these days seems to be largely optimization research), but please don't take it out on working programmers.

What is true is that you can't have optimizations that dramatically rewrite the code. C is not the language for those types of optimizations. It is the language for assisting the developer in writing fast and predictable code

> even simple things like loop vectorization

I am not at all convinced that loop vectorization is something a C compiler should do automatically. I'd rather have good primitives that allow me to request vectorized computation and a diagnostic telling me how I could get it.

C is not FORTRAN.

As another example: condensing a loop that you can compute the result of at runtime. Again, please tell me about it, rather than leaving it in without comment and "optimizing" it. Yes, I know you're clever, please use that cleverness to help me rather than to show off.

> Really, can you blame the compiler?

Absolutely, I can.

> If you had a conditional that had a branch that was provably false,

"Provable" only by making assumptions that are invalid ("validated" by creative interpretations of standards that have themselves been pushed in that direction).

> wouldn't you want the compiler to optimize it out?

Emphatically: NO. I'd want a diagnostic that tells me that there is dead code, and preferably why you consider it to be dead code. Because if I write code and it turns out to be dead, THAT'S A BUG THAT I WANT TO KNOW ABOUT.

This isn't rocket science.

> security issue you mentioned, that's basically what the compiler's doing: removing a branch that it knows never occurs.

Only for a definition of "knows" (or "never", take your pick) that is so broad/warped as to be unrecognizable, because the branch actually needed to occur and would have occurred had the compiler not removed it!

> The article you linked to in your blog post is most likely not serious

I think I noted that close relationship in the article, though maybe in a way that was a bit too subtle.

saagarjha · on April 27, 2018

Hmm…let's try a simpler question, just so I can get a clearer picture of your opinion: what should the compiler do when I go off the end off an array? Add a check for the bounds? Not put a check and nondeterministically fail based on the the state of the program? How about when you overflow something? Or dereference a dangling pointer?

You seem to not be OK with allowing the compiler to trust the user to not do bad things–but you do trust them enough to out-optimize the compiler. Or am I getting you wrong?