I always encourage people to go read Dijkstra's GOTO paper, instead of just its title. It's a short and easy read, almost like a blog post. If you pay attention, you can see that the he was talking about spaghetti code vs structured code. It's better when the lexical structure of the source code maps to the execution structure. That is, if you know what is the current line being executed, you have a good idea of what the global state is, which lines executed before this one, and which will be executed next.
The thing is that many people today have never encountered the sort of spaghetti code that Dijkstra was talking about in 1968. There's plenty of confusing and messy code around, but true spaghetti code that GOTOs all over the place and is nigh-impossible to follow has been extremely rare for a long time. I can't recall encountering it in the last 30 years.
It easy to misunderstand what he was even talking about because the paper is so short, assumes you know about this context, and has no concrete examples. People quite reasonably assume it's about ugly code they've encountered, but it's actually about ugly code of a completely different kind.
I'm not that old, but I was unlucky enough to have programmed in an unstructured language where GOTO was the only way to use faux-subroutines in my teens. Whatever you think as code that's difficult to follow: it's nothing compared to this.
> The thing is that many people today have never encountered the sort of spaghetti code that Dijkstra was talking about in 1968.
Can't highlight this enough. The type of spaghetti code "goto considered harmful" was reacting to is basically impossible to create anymore, so anyone who didn't work on that type of code in the 80s or earlier probably hasn't seen it.
And thus, is applying the mantra "goto considered harmful" incorrectly. (Such as trying to avoid it in C for clean error handling, when there's no reason to avoid that.)
To try to replicate that experience, you'd have to write your entire application (including all libraries since libraries were often not a thing) as one single function in C. All of it, no matter how many tens of thousands of lines of code, all in one function. Then label every line. Then picture having GOTOs going every which way to any of the labels. For instance you'd preset the counter variable to some desired value and jump right into the middle of a loop elsewhere. Over time you'd surely accumulate special conditions within that loop to jump out. And so on. It's difficult to even imagine code like this today (or in the past 30 years).
Sure it's possible to have horrible spaghetti today. Just look at any pubsub architecture based system and tell me what piece of code executes after another. It's super popular, and it's GOTOs all over again, just with data instead.
This is a good observation, thanks. Yes, some of these cloud-native patterns do become what is essentially a spaghetti flow pattern, even if not in the fragmented pieces of code directly.
Async-await in general has all the same pitfalls. In fact, async programming, being based on reifying program continuations, is a GOTO equivalent in a quite literal sense.
I agree, the closest modern equivalent is code that branches too much from too many places. Modern tools still help a lot to reason about it but it comes to a place where I, personally, have to create call flow diagrams on pen and paper or a whiteboard to actually understand some flows.
Code with extreme branching, while dealing with state/boolean parameters that determine flow branching; error handling that can also create other branches of execution; all of that is really hard to keep in mind when reading such nightmare codebases...
> The type of spaghetti code "goto considered harmful" was reacting to is basically impossible to create anymore, so anyone who didn't work on that type of code in the 80s or earlier probably hasn't seen it.
Its still quite possible in assembly, where goto (JMP) is your only way to do control flow. But I doubt there's many people left who write and maintain large assembly programs. I imagine most programmers reach for C or something higher level as soon as the program becomes non-trivial.
I still use this cute online Intel 4004 simulator sometimes when I teach programming:
Its a fun challenge for novice or advanced programmers alike to write little programs in assembly for a CPU from 1971. The assembly language[1] is only 45 commands, and you only need a handful of them anyway. The CPU interpreter is simple enough you can literally see it think.
> Its still quite possible in assembly, where goto (JMP) is your only way to do control flow.
Yes, certainly very possible in assembly as it ever was. But as you note, few people are doing large scale assembly programs anymore. And I'd say that those who still do, are sufficiently experienced to avoid unstructured jump explosion in their code, hopefully.
I learned coding in the 90s and I've seen that kind of code. While BASIC itself already advanced to the point where structured conditionals and loops were available pretty much everywhere, plenty of code that was written earlier was still around.
Technically nobody stops you from writing one giant function with labels and goto, it's just not the most obvious path even to the most inexperienced programmers.
> Such as trying to avoid it in C for clean error handling, when there's no reason to avoid that.
Dijkstra would clearly disaprove of this use of goto. But he would blame the C language for making it necessary. Languages with structured cleanup (like the using clause in C#) does not need to use gotos for resource cleanup.
Dijkstras argument does not distinguish between short and long gotos or long or short functions. His argument applies to any use of goto.
If you have ever seen someone try an construct a bunch of nested IF statements with complicated conditional clauses you might think GOTO is not so bad. People have simply become better coders. There are also still GOTOs that are used in specific cases such as CONTINUE and BREAK - no labels required.
If I look back it always comes back to naming and managing names of things. GOTO 100 is meaningless and one eventually runs out of meaningful names for GOTO labels. For me OOPs addressed the naming issue relatively effectively by using the data type as a namespace of sorts.
CONTINUE and BREAK are quite different from GOTO in that they operate predictably given the current scope: their limitations make them incapable of creating the unstructured nightmare that Dijkstra was talking about. They're similar to a GOTO only in that they compile to a jump, but so do IF statements and FOR loops.
Structured programming wasn't about eliminating jumps, it was about enforcing discipline in their use. The simplest way to do that is to eliminate raw GOTO from the language, but it's also possible to just be careful and use it wisely.
Not too many things make me shake my head harder than folks who consider continue/break to be GOTO equivalents. For the reasons you eloquently said.
Additionally, far more often than not, continue/break allow you to avoid another form of complexity, bugs, and low comprehensibility: deeply nested conditionals.
CONTINUE and BREAK are simply jumps to the beginning of or just past the end of the current loop context. They are equivalent to GOTOs to particular program offsets without the programmer needing to create labels for those offsets. They do not have any magical meaning beyond that. You could even call them syntactic sugar.
Structured if/then/else is also merely syntactic sugar over if/goto, but that doesn't make it any less useful.
What makes break/continue (including labelled variants a la Java) useful is the fact that the restriction on where they can jump means that the control flow graph is guaranteed to be reducible. That is not the case with free-form goto.
They're not syntactic sugar in any language that does not have GOTO, because the semantics of GOTO-free languages don't allow arbitrary jumps, so there is no equivalent syntactic structure that you can compile BREAK to.
The distinction matters because the whole premise of Dijkstra's argument is that if you replace the GOTO keyword with a bunch of more limited versions that cannot be used to produce spaghetti, code quality would go up. The only way for that to work is for the language to be semantically incapable of expressing GOTO.
As I said in my other reply, you seem to have the subtyping relationship wrong: GOTO is a subtype of BREAK (anywhere you find a BREAK you could replace it with GOTO), but BREAK is not a GOTO (you cannot do the reverse).
See my other reply. GOTO is the generic type because it can be used to jump anywhere. BREAK/CONTINUE are sub-types because they are limited in where they can jump. BREAK/CONTINUE can be always implemented using GOTO, but not the other way around.
I agree that BREAK/CONTINUE are not syntactic sugar in languages that don't have GOTO.
No, you're still mixing up the subtype relationship.
Type X is a subtype of type Y if and only if an instance of X can always be used where an instance of Y is required.
GOTO can always be used to replace a BREAK. Therefore GOTO is a subtype of BREAK.
BREAK cannot always be used to replace a GOTO. Therefore BREAK is not a subtype of GOTO.
The inheritance relationship here is not single, it's multiple: a GOTO is a BREAK, but it is also a CONTINUE and a whole lot of other things. It's like a monster class that inherits from every interface under the sun and can do just about anything.
Dijkstra was basically advocating for refactoring our languages to extract those capabilities into smaller, more focused keywords (as well as dropping most of the functionality). Rather than having one keyword implement both the BREAK and CONTINUE interfaces, we break them out into separate keywords.
We apparently disagree on whether the general, flexible construct is the subtype or whether the specific, constrained construct is the subtype. You seem to be thinking from a object-oriented programming class hierarchy perspective, while I am thinking from a set theory perspective (i.e, the set of operations that can be done with GOTO is a superset of those that can be done with CONTINUE or BREAK).
At this point, I don't care which you call the subtype. You can claim that as a win if you want, but having spent so much time on this stupid thread I think we've both lost.
I actually rather enjoyed the conversation and didn't feel it was wasted at all, but I'm sorry you didn't feel the same. I wasn't in it to win, just to explore the idea.
I'm still interested in exploring the idea, but you're welcome to tune out at any point.
> the set of operations that can be done with GOTO is a superset of those that can be done with CONTINUE or BREAK
Yes! And this is actually part of my point. If Y is a subtype of X, then the set of valid operations on Y is a superset of the set of valid operations on X. This is true for any types, by the definition of subtyping.
This means that you're absolutely correct that the set of operations GOTO can perform is a superset of those BREAK can perform, and for this very reason GOTO is a subtype of BREAK.
The reason why I'm focused on the types and not the operations is because the question at hand has been whether BREAK has the same flaws as GOTO. My argument is that this hinges on whether or not BREAK is just a type of GOTO.
Yes, it's all compiled to jumps... The point of the discussion is that things like continue and break are easier to read and reason about because they can't just jump anywhere.
My point was to contest GP's assertion that CONTINUE and BREAK were not equivalent to GOTOs.
I agree that CONTINUE and BREAK are easier to reason about because you can look at them and instantly know what they do without having to look up what label they're jumping to.
My point is that it's meaningless to make the argument that CONTINUE and BREAK can be implemented with GOTO, because every control flow structure can be. That you could use a GOTO to implement them isn't in question, what's in question is if you could do the reverse.
It's a subtyping problem, and you have the is-a relationship backwards: a cat is an animal but not every animal is a cat. GOTO is a BREAK (could always be substituted for one), but a BREAK is not a GOTO.
When you need a BREAK you could implement that in terms of GOTO, but no amount of coercion will allow you to use a BREAK as a generic GOTO.
> GOTO is a BREAK (could always be substituted for one), but a BREAK is not a GOTO.
You wrote this backwards, but you seem to understand the relationship and that GOTO is more general. That is, every BREAK is a GOTO (because you can always substitute a GOTO), but every GOTO is not a BREAK (i.e., you can't substitute a BREAK for some GOTOs because BREAK cannot jump to an arbitrary label).
No, I wrote it in exactly the order I wanted to. Because of the substitution property that you acknowledge, GOTO is a subtype of BREAK. BREAK is not a subtype of GOTO because it cannot always be substituted for GOTO. Thus, "GOTO is a BREAK, but a BREAK is not a GOTO."
A GOTO is just one possible implementation of BREAK, just as a cat is one possible implementation of an animal.
The practical impact of this is that it is incorrect to ascribe to BREAK the same weaknesses as GOTO, because BREAK is not a GOTO.
The moment I received a down vote for a simple opinion/observation about relieving naming overload and the similarities of BREAK and CONTINUE to GOTO I knew where this was going:)
IF and WHILE are also equivalent to GOTOs in that sense.
The point is that CONTINUE and BREAK jump to exactly one location given their lexical context and cannot jump anywhere else. They are also only meaningful when applied to structured control flow. The problem with GOTO is the unbound nature of its jump target, which leads to control flow that is difficult to comprehend by looking at the lexical structure of a function.
The argument against gotos in Dijkstras article would apply equally to breaks and continue and even to early returns.
I dont fully agree with Dijkstras argument. For example I think early returns can often improve the readability of the code. But worth noting Dijkstra is not primarily concerned about readability but rather about how to analyze the execution of a program.
Far as I could tell coming in on the end of it people like Dijkstra were primary trying to write proofs about programs. That motivated them to ban constructs they didn't know how to analyze. Problem is that some of those things turned out to be trivially tractable but lots of people never got the memo.
If you read Dijkstra's letter it wasn't about formal proofs. It was about go to statements being very hard to reason about, especially when trying to understand the flow of a program and how you got to a particular point in its execution. The word "proof" doesn't even show up in the letter. It's only a page or so, well worth a read instead of guessing at what he may have been writing about.
> CONTINUE and BREAK are quite different from GOTO in that they operate predictably given the current scope
CONTINUE, BREAK, and GOTO all operate predictably because they are deterministic operations. Each continues program execution at the directed explicit (goto) or implicit (continue or break) offset. There is no non-deterministic or unpredictable behavior whatsoever.
Predictably may have been the wrong word, because that does imply non-determinism. It might be better to say that CONTINUE and BREAK are limited: given a scope, the keyword can take you to exactly one place, while GOTO could be used to take you anywhere, and you have to go hunting for the corresponding label to find that place.
> operate predictably because they are deterministic operations
If I sat you in front of a computer generating numbers using a pseudo random number generator and gave you as context the last number it generated, could you make any prediction about the next number it generates?
Now if it used a prng that was known and standardized to only compute one number could you predict anything about the next number now?
A better rule than goto harmful is gotos should only go lower in the function and should only exit blocks and/or skip over them, never be used to enter them.
Generally true, but I've been known to do `goto again;` for those cases where retrying is a corner case. Sure, you can put the entire code inside a `for(;;)` but if it almost always only runs once, you're not helping the reader understand the code.
I'm fond of the MISRA C approach where you have all these bright line (even machine checkable maxims) but if you have a reason to break one you're just supposed to write up a report why its better to do it this way and how you've addressed the risks.
Seems like a reasonable trade for the occasional "goto again".
[Before anyone reads the above as advocating MISRA C ---- I think MISRA actually tells you not to do "goto fail;" which is advice I'm kind of dubious about. It also tells you to not do "good = good && side_effecty_thing();" (no shortcutting operators when there are side effects) so its style has you make a typical function absolutely littered with explicit initialization guards.]
The first program listing is on page 24 of the PDF. Try to follow the logic of the program. Why does line 400 go to 280? What paths can lead to line 400? Who knows! And this is high-quality BASIC by 1979 standards — it’s in a printed book after all.
There’s an auxiliary listing after the program itself explaining the routines and variables used, but many/most programs in those days wouldn’t have this level of rigorous documentation. Deciphering the program would probably have to start by drawing a flowchart of execution paths.
Worth also noting the sort of wild limits with Basic too that are partially responsible for the code being spaghetti, including effectively an inability to add new lines in the code without adding a goto between previous statements.
gwbasic had a function to relabel all the lines. And the labels skip by 10 numbers exactly for the purpose of inserting lines. Then when you had hit the limit you'd ask the computer to relabel them in steps of 10 again.
> gwbasic had a function to relabel all the lines. And the labels skip by 10 numbers exactly for the purpose of inserting lines. Then when you had hit the limit you'd ask the computer to relabel them in steps of 10 again.
gw-basic was released c4 years after this book, and there was a huge change over that period:
1976 - Release of Apple I
1977 - Release of Apple II / Commodore PET
1979 - This book
1982 - Commedore 64
1983 - GW Basic
This book is pretty much closer to the Apple I than gw-basic. Perhaps I should have specifically said developing basic in 1979 though as referenced in that book though (and there isn't just one sort of basic - there are so many dialects).
Wow, that takes me back. I learned to program on a PET and would have devoured this book had it been available. As it was one of our math teachers was tasked with teaching the computer classes bu didn't have any programming knowledge beyond input/output, loops, and simple calculations. Books and manuals were hard to come by.
> And this is high-quality BASIC by 1979 standards — it’s in a printed book after all.
I don’t think that’s true, certainly not for books of that time period. Because the whole field was changing rapidly, writers would often work under tight schedules, and customers would buy about anything because they only had magazines and books to learn from and review sites didn’t exist.
I also think that’s bad Basic for the time. Certainly, comment lines before subroutines would help.
It looks pretty typical for BASIC of the late 70s to me.
You wouldn't want to waste characters on commenting code: machines of that era would have only a few KB of RAM, as low as 1K. For the same reason you don't want to waste characters on long, meaningful variable names or on well spaced code. Multiple statements per line isn't to save print space, it's to save RAM.
Meanwhile, the program's pretty well structured for such a short bit of BASIC: subroutines start at multiples of 100, for example, and each subroutine starts and ends clearly, no shenanigans like jumping from the middle of one sub to another, no multiple exit points for subs, all as linear as it can be. The use of IF is limited to skipping forward a short way to conditionally execute a line or two only. GOTO only exists in those IF statements.
I'd have been happy to have written code like this, back then.
I am pretty sure that my uncle ran this exact program on his computer and printed out biorhythms on listing paper, in the mid 80s.
I messed with some programs written in basic for industrial controllers that was written in late 1970's.
There is a simple thing. On a lot of machines only spaghettified programs would even fit in the memory available. Academic CS researchers with their unlimited accounts on the institutions mainframe didn't have that worry.
I would argue that the average program that got printed in a book was probably quite bad but still of a higher quality than the programs people wrote on their own, simply because the latter were usually written without any education or useful models of working programs.
It’s like an iceberg of bad code: the underwater part nobody saw was astonishingly terrible by modern standards. That code might be running a business, but its author would never get exposed to professional programming. Today Excel often serves a similar purpose. (Excel isn’t spaghetti though since the execution model is completely different.)
That code looks normal to me. I have a ton of BASIC books and magazines. You're talking about a time period before full screen text editors were a thing. It's almost impossible to explain to anyone that didn't have to work with TI-99 BASIC, C64 BASIC, GW-BASIC/BASICA, etc. what it was like. Once you got to QBASIC/QuickBASIC it was done. Life was easy. A few years before that, and you're printing out pages on a dot matrix printer and going line-by-line to debug. You'll notice a distinct lack of white space between lines and that comments start with "REM" and a line number. You didn't even get labels for lines. The code looks like that because those were technology limits on really rudimentary devices. You were editing code inside the BASIC interpreter, often using some command like "LIST <line#>". It was awful.
We take so much for granted today. Dual monitors. Color. More than 80x25 character display. Multiple screens and multitasking. Just getting to Linux in '95 and having F1/F2/F3/etc. switching terminals was a huge deal.
I remember in Commodore basic, finding that a period by itself ('.') was parsed as zero, but was actually slightly faster than using zero, and saved a byte every time it was used. In other words, you could write:
10 for i = . to 6.28 step 0.1:next
and it would be slightly faster and smaller than
10 for i = 0 to 6.28 step 0.1:next
Made for some ugly inner loops, but you gotta do what you gotta do. For that matter, we certainly would have removed some of those extra spaces as well. Bytes mattered and whitespace slowed you down.
> The first program listing is on page 24 of the PDF. Try to follow the logic of the program. Why does line 400 go to 280? What paths can lead to line 400? Who knows!
Without looking at the post-program material, this isn't exactly a difficult question to answer.
Line 400 is preceded by some print statements:
370 PRINT "PRESS 'E' TO END, SPACE TO CONTINUE"
380 GET R$:IF R$="" THEN [goto] 380
390 IF R$="E" THEN [goto] 120
400 L=0:GOTO 280
So we have a prompt that says "press E to end, space to continue", and then branches one of three ways: if you provide no input, the prompt is shown again; if you provide an E, the entire program restarts from scratch, and if you do anything other than that, the count of lines drawn on screen is reset to 0 and the next 18 lines of the chart are drawn.
We can assume that line 400 will be hit whenever a piece of chart is drawn to the screen.
The program's structure here is a nested loop: there is a loop between lines 280 and 400 (displaying the chart indefinitely, 18 lines at a time) containing another loop between lines 300 and 360 (displaying 18 lines of a chart, one line at a time).
Why is this supposed to be an example of spaghetti code?
Pretty sure we had that program on our home computer in the 80s (I was a young kid but I distinctly remember a biorhythms program). What impresses me reading it now is the "y2k" compliance. If the year entered is only two digits, it adds 1900, otherwise it takes the full year.
To give people some kind of an idea of what it was created in response to: Imagine writing an entire program in one single main function. The only thing you're allowed to do for flow control is goto. You can do 'goto somelabel;' for an unconditional goto, or you can do 'if (somecondition) goto somelabel;' for a conditional goto. Here's some examples of how it would look if if translated to something C-like:
Loops would look like:
int i = 0;
loop_start:
print i;
i = i + 1;
if (i < 10) goto loop_start;
Fizzbuzz would look something like:
int i = 0;
loop_start:
if (i % 5 != 0) goto not_fizzbuzz;
print "fizzbuzz";
goto done;
not_fizzbuzz:
if (i % 3 != 0) goto not_fizz;
print "fizz";
goto done;
not_fizz:
if (i % 5 != 0) goto not_buzz;
print "buzz";
goto done;
not_buzz:
print i;
done:
i += 1;
if (i < 100) goto loop_start;
Often, this wasn't just constrained to a single function; the whole program would be constructed like this, with gotos which jump back and forth across pages and pages of code. Languages wouldn't even have a call stack with subroutines (which is why "procedural" languages -- languages with procedures -- were important enough to be given a special name).
At least that's my understanding of it. I haven't lived through this, and my only experience with this kind of stuff is writing assembly, where we always make use of a call stack, so even that is in practice a procedural language. If I have gotten anything wrong, please correct me.
Procedural or not was more of a spectrum. If you look at early BASICs, for example, they had GOSUB, and it could recurse, so there was a return-address stack. But GOSUB did not have any provisions to pass arguments or return values - it was just a GOTO that remembered where it came from; you had to use globals to pass data around. So if you wanted a data stack (i.e. locals), you had to rig your own with arrays.
OTOH early FORTRAN had procedures with arguments and results, but no recursion.
Structural or not was also not necessarily all-in. FORTRAN and BASIC both had for-loops before they had structured conditionals.
tangentially, can somebody point me to what are considered the best fizzbuzz solutions? I'm both an experienced and cs-educated coder, and I know what I would consider to be a good solution, but I have no idea what the rest of "you" are looking for. (my favored solution would be a small number of state machines running in parallel to sieve-of-eratosthenes the correct answers thus avoid innumerable divisions, but maybe that's just me and I'm old fashioned?)
Almost, except in assembly, we always have a call stack. So even assembly is a procedural language, even though it doesn't otherwise have structured control flow.
I used to work at Microsoft on the windows team. There it was very common to have a “goto cleanup” for all early exits. It was clean and readable. Otoh, I once was assigned to investigate an assertion error in the bowels of IE layout code. It was hundreds of stack frames deep in a recursive function that was over a thousand lines long and had multiple gotos that went forwards or backwards. That was an absolute mess and would have been impossible to debug without an recording (“time travel”) debugger.
> but true spaghetti code that GOTOs all over the place and is nigh-impossible to follow has been extremely rare for a long time.
Exactly! Recently I had the "pleasure" to work with some FORTRAN IV code from the early 60s, so I know what you mean. No functions/subroutines, only GOTOs. Even loops were done with labels. There is also a weird feature called "arithmetic IF statements" (https://en.wikipedia.org/wiki/Arithmetic_IF). Luckily the code was pretty short (about 500 lines including comments).
Those are just regular loops, though, right? I’m looking for the posttest loop, sometimes known as the do…while or until loop. There seems to be a unfortunate inconsistency in the naming of this thing.
Ah, right, I misread your post. DO WHILE is indeed a "normal" while loop. There doesn't seem to be an equivalent to "do { ... } while", as found in most C-style languages.
> There seems to be a unfortunate inconsistency in the naming of this thing.
Well, Fortran is older than C, so you cannot really blame them :-)
The funny thing is, I mostly program in Fortran (thus the interest in this construct). It is nice for expressing “this iterative method must be run at least once.” Unfortunately at some point I absorbed the name that comes from the C-ism, haha.
A less ambiguous name for those is repeat/until, as seen in Pascal and its descendants - I don't recall any language that uses that syntax for anything other than a postcondition loop.
I recall having to sort out spaghetti Fortran back in the '80s; numbers as labels (a la basic but with free-form numbering), computed gotos back and forth in the code, stuff like that. Learnt a lot from fixing that mess.
Forty years later, I'll still use a C goto if the situation warrants (e.g. as a getout from deep but simple if). Maybe because having long been an assembler programmer as well, goto's are part of the landscape (if/else is effectively a conditional and unconditional branch/jump).
GOSUB was so much worse than GOTO in that it had a stack for the current line but no stack for variables so you could not write recursive functions. I think Fibonacci as a recursive function is malpractice but boy was it a hassle to write Quicksort in BASIC although I had no trouble coding up an FFT (1950s FORTRAN style) from first principles in BASIC on TRS-80 Model 100 on a bus ride across Vermont.
Funny though I did come to a conclusion that for the Arduino programs I wrote I didn’t need a stack at all.
Yeah I saw a codebase written in Fortran 77 (the program was written in 1988 or 1989 I believe) and geez... it is almost impossible to figure out what is going on. Programming has changed a lot.
> I'm not that old, but I was unlucky enough to have programmed in an unstructured language where GOTO was the only way to use faux-subroutines in my teens.
Was that a Casio calculator? Because it was like that for me, only GOTOs existed. Learning about C and seeing these things called loops was a revelation because I had reinvented them with GOTOs already in my programming.
I've seen full spaghetti on recently written C driver code for an IC. The device contained a 24-bit processor that wouldn't have a compiler and its one-man dev team was necessarily doing all of the firmware in assembly. He basically wrote all of the C with assembly style control flow, goto-ing all over the place and zero high level statements.
I read the paper when it first came out. At the time I was programming in Fortran which had the three branch if-statements. As you pointed out, that was a special kind of hell. The paper rang very true. However we did all take it to the extreme and go for zero goto with quite a fervor.
I dug around a little and found an example [1] on a reddit thread looking for examples of spaghetti code. Most of the examples on the thread were just badly written code. Irreducible spaghetti code tends to be complex state machines that cannot be rendered well in a flat format. People like to flatten those out with a trampoline pattern[2], but that can hinder performance.
Malicious spaghetti involves transformations such as
for (x = 0; x < 10; x++) {
for (y = 0; y < 20; y++) {
printf("%d %d\n", x, y);
}
}
|
| DRY
v
x = 0;
loop_x_head:
condition_val = x;
condition_stop = 10;
condition_var = 'x';
goto check_condition;
loop_x_body:
y = 0;
loop_y_head:
condition_val = y;
condition_stop = 20;
condition_var = 'y';
goto check_condition;
loop_y_body:
printf("%d %d\n", x, y);
increment_val = y;
condition_stop = 20;
increment_var = 'y';
goto increment;
loop_y_end:
increment_val = x;
condition_stop = 10;
increment_var = 'x';
goto increment;
loop_x_end:
halt;
increment:
increment_val++;
if (increment_var == 'x')
x = increment_val;
if (increment_var == 'y')
y = increment_val;
condition_val = increment_val;
condition_var = increment_var;
check_condition:
if (condition_val < condition_stop)
goto pass_condition:
if (condition_var == 'x')
goto loop_x_end;
if (condition_var == 'y')
goto loop_y_end;
pass_condition:
if (condition_var == 'x')
goto loop_x_body;
if (condition_var == 'y')
goto loop_y_body;
I grew up with MSX-BASIC: https://github.com/plattysoft/Modern-MSX-BASIC-Game-Dev/blob... – GOSUB jumps to a specific line number (RETURN returns from where it jumped). Even a fairly simple and clean example like this can be rather difficult to follow.
I kind of have, instead of GOTOs, we have RPC calls on the flavour of the day, that just GOTOs only make sense after doing a full diagram of call sequences.
Just like 8 bit BASIC spaghetti code, only refined.
You're lucky. If you want to feel comfortable on a plane, don't work on avionics systems written in the 1970s-1980s (and probably a lot from the 1990s). Some horrifically bad code running some planes.
Sure, it's a good letter, and indeed it's about this problem where people use a go-to or jump control flow with no structure whereas by the time he wrote that letter there were well understood structures (like: functions) to prefer.
However, while I sympathise with a C programmer who feels goto is necessary in their language, I think that speaks to a problem in the language rather than the programmer. If you have better structural support in your language you can express the things the author (of the link, not Dijkstra) wants to express without needing this unstructured go-to.
For example Rust's break 'label value; allows us to mark any compound expression with the 'label, and then say from anywhere inside that expression but nowhere else that we've decided the value of the expression overall and here's what it is.
This doesn't feel that different from what is being done here with goto, except for two crucial things as a result of being structured:
1. Rust will type check this, if this region of the program picks a Dog, our break needs to provide a Dog, it can't just shrug and expect the program to continue without one. This means maintenance programmers don't need non-local reasoning, this region of the program does, in fact, always pick a Dog, albeit the break 'label value is something to look closely at if you're reading that region itself.
2. We cannot do this, even by mistake (e.g. as a result of copy-paste) across scopes. If you try to break 'label result from the cat care loop into the dog loop earlier in the same function, that just doesn't compile, whereas the C goto has no problem attempting that (a good C compiler should notice if you try to do something really egregious, but good luck).
Interestingly, many seem to want to project their own opinion upon Dijkstra.
Dijkstra is clearly arguing for a “single entry single exit” style. But modern consensus seem to uphold single entry but accept multiple exits from a block. Break, continue, early returns, exceptions - all are example of multiple exit. These are more constrained than gotos but nevertheless Dijkstras argument applies to them also.
I personally belive early returns can greatly improve readability (when not nested too deep) an that exceptions are typically cleaner than the alternative. But I acknowlede Dijkstra would disagree.
IIRC multiple entry points were common in Dijkstra's times.
Break, continue, and early returns always return to the end of the block, unlike goto in the '70s. Exceptions are more complicated, and the cause of a lot of inunderstandable programs.
> It's better when the lexical structure of the source code maps to the execution structure.
Sure, but that has nothing to do with whether the GOTO statement should be used. The semantics of GOTO are arguably quite clear; they amount to setting a different continuation for the running program. (Semantically, this implies that the precondition of the GOTO statement is made a possible precondition for the label that the statement jumps to; and execution simply does not proceed to the next statement.)
There are even "relooper" algorithms to reconstruct a structured program from patterns in the idiomatic use of GOTO statements: they are used in compiling to structured object languages such as WASM code. Using GOTOs in an idiomatically sensible way (avoiding spaghetti code) may be less readable than writing actual structured blocks, but only slightly so.
> On the other hand, today we have the very opposite situation: programmers not using goto when it's appropriate and abusing other constructs, what ironically makes code only less readable.
One I see all the time from beginners is creating a finite state machine using one method per state and jumping between states by calling the next state’s method from within the current state’s method. Essentially just emulating goto with the added disadvantage that you’re pushing each new state onto the stack until it overflows, so refactoring it using goto would constitute an improvement.
The fact that beginners reinvent this pattern over and over demonstrates it’s easier for people unfamiliar with programming to reason about a program that uses goto, which would explain why it was so ubiquitous in the early days of computing and given its shallower learning curve its usefulness as a teaching aid as a first step before structured programming is being overlooked.
Nowadays you can direct clang to require tail-call elimination in C. [1] In gcc you can provide the optimization flag, -foptimize-sibling-calls, which is automatically selected at -O2, -O3, or -Os. [2]
Not if you don't know how to write in only tail calls. Try explaining tail call elimination to one of these programmers learning how to write a state machine (=
If you’ve ever actually written more than toy C, you’ll know gotos are essential for resource cleanup and error handling. Even the famous goto fail was not a goto error, it was a block error (named because a cleanup statement `goto fail;` always executed because of a brace-less if). goto can be abused like any other language construct, but it’s uniquely useful and makes code more simple and easy to reason about when used correctly. You just shouldn’t be using C any more unless under duress.
Not really, I have deployed plenty of C code into production and never used gotos, other than special flavoured gotos like UNIX signals.
My problems have always been in how memory corruption friendly C happens to be, nothing to do with how to use structured programming practices for resource cleanup.
I don't see any similarity between gotos and signals at all. Gotos are just explicit jumps, while the "normal" structured way of doing jumps is to have them inferred from nested regions (that double as scopes and lifetime guards for automatic variables).
The structuredness of such inferred jumps usually is both sufficient and convenient, but sometimes being extremely structured leads to boilerplate and inefficiencies. That shows already with early returns from nested scopes, which aren't widely frowned upon, while they are very similar to common usage of goto. I would say I haven't encountered a goto that isn't basically an early return from some nested scope to after some parent or grand-parent scope.
In that sense, a goto can save from having to extract a nested scope as a stand-alone function just to write it as an early return. I would say that many gotos in the wild could be rewritten as early returns after making a standalone function, but maybe sometimes this is too much of a hassle, or, subjectively, puts a toll on readability.
Signals are implicit GOTOs in the sense of INTERCAL's COMEFROM.
As for structured handling, instead of gotos all over the place, do inverted conditions for early returns similar to what Swift has done with guard statement, embrace functions for resource cleanup, if the cost of a jump brings cold sweat, have them inline and called alongside a return, most compilers will replace calls with jmp opcodes.
> Signals are implicit GOTOs in the sense of INTERCAL's COMEFROM.
What?
> guard
How does it help to have an inverted if statement? Not sure what is the point, can do without.
Goto isn't necessarily for resource cleanup. The common usage is as an early break from the currently executed block, to what comes after the block. Which is often cleanup code, but not necessarily.
You clearly don’t understand the problem. Imagine all the ifs properly inverted like you want them to be. Good. Now imagine you have to free 6 allocations, close 3 FDs, and wait on a few threads before returning. Imagine you have 10 early returns. Your cleanup function would be an unreadably silly 10 arg thing with extra pointers everywhere and you have to call it 10 times. Thats insane boilerplate just because you cant stomach a local goto. Why not just goto cleanup, avoid the mess, save a bunch of time, and cleanup naturally and locally in the same function where everything is defined?
If you're steeped in C and its quirks, have good coding patterns that allow for it, I say knock yourself out.
I'm absolutely coding in C again — writing some old-school games using SDL. After working for decades with various retain/release, garbage-collected, magic-memory™ languages, going back to C feels like programming again. I like it.
> If you're steeped in C and its quirks, have good coding patterns that allow for it, I say knock yourself out.
Somewhere between 95% and 100% of people who think they're "steeped in C and its quirks, have good coding patterns that allow for it" write code that invokes undefined behaviour.
Undefined behavior means the language provides no guarantees about the outcome. Whatever happens is up to the compiler, or rather the specific implementation of the compiler, and the target architecture on which the program is run. These things are not constrained by any spec or guarantee, so they are allowed to change, for any reason, to anything. These changes may be triggered by not only different execution environments, but also changes to properties of a given execution environment. The behavior of a program with undefined behavior is non-deterministic, and cannot be predicted, modeled, or effectively maintained.
Like I said, Rust and Python have no specification at all, so technically any program written in them is undefined behavior, depending on the particular interpreter/compiler binaries you use and your architecture.
The only sin of C++ is the same that plagues evolutionary languages like Typescript, Kotlin and so forth.
No matter how many tools they provide to write better code than the language they have grown from, their compatibility with them is like hearding cats while trying to have everyone adopt best practices.
Otherwise in regards to Arduino, I would suggest a couple of nice BASIC and Pascal compilers like those sold by Mikroe.
The ATmega 328P an Arduino Uno or Nano uses has 2KB of memory and 32KB of flash storage for the program. Having some sort of micropython interpreter there would be impossible, and even if it could be achieved, somone still has to write the low level C/ASM code to make it all work. For many embedded systems, low level languages like C or C++ (perhaps Rust for a lot of ARM micros) is the only sensible choice. If it's not a multi user system that's connected to a network, people trying to abuse memory unsafe code isn't a concern.
> has to write the low level C/ASM code to make it all work
shhh. Let's not disturb those that believe in the GC fairy that sprinkles their code with safety magic late at night while they soundly sleep. Firmware doesn't exist. I can't hear you. Na na na na na na na
I hope Rust takes off for embedded programming. I've been working with ESP32 and C is really the only choice if you are doing anything remotely fancy or resource constrained
What bothers me about rust is the designers are people that have been bitten hard by working on browsers written in C++ with it's manual memory management and no way to enforce object lifetimes. Rusts solution to that seems like a big hammer when it comes to embedded where you usually have either stack allocated or compile time allocated objects.
If you code in that style you won't even notice the borrow-checker is there. Unless you have a bunch of shared global state, the the overhead of arc will seem silly on your single core uc.
That kind of style is often correlated with a lot of shared global state, even in relatively high level software like database engines. Most software avoids shared global state by delegating that implementation to the operating system, which often comes at the cost of performance.
I have worked in embedded code bases like that. While some of it is unavoidable for io, I have worked in other codebases where the compile time allocated memory was moved around in a way that would have satisfied the borrow checker except for that first mutable borrow. I haven't done rust for uc yet, so I won't claim it is a good fit, but it didn't seem like the parent poster had either, and I think there is a decent chance it could work well of the tooling is there. Which I'm not sure it is.
Python is more than adequate for most of the simple scripts that people run on Arduinos. There's no reason that a lightweight interpreter or even a compiled implementation could not run on even the AVR-based Arduinos, and obviously the beefier ones are straight up 32-bit ARM so it would be trivial to stand up MicroPython on them.
In RAII languages[0], you obviously don't need unrestricted gotos.
However, I always find myself missing it when writing nested loops. Labeled break and continue[1] ought to be considered standard structure programming primitives. These are restricted gotos and allowing them to break or continue a parent loop doesn't unrestrict them much. But it does significantly improve the expressive power of your looping constructs.
[0] C++, Rust, Go, or anything else with automatic memory management and destructors
[1] I've also heard of numbered break/continue. Personally I think this isn't good enough: what if I need to move loops around? That will change the meaning of a `break 2`. With a `break OUTER` the compiler will yell at me if I remove the outer loop without changing all the code that breaks out of it to target a different one.
This is one of those half-true claims that C++ aficionados make, probably because they can play a bit fast and loose with resource cleanup in their particular situations.
RAII simplifies some of the resource cleanup, but at a cost: if the resource cleanup fails, there’s essentially no way to convey this.
So yes, you can write a destructor that tries to clean up your resources regardless of how the code exits its current scope. But if that cleanup encounters a problem, the destructor can, at best, try to convey this indirectly, either by (commonly) logging or (rarely) setting a variable to a value. It certainly cannot throw another exception, or even directly manipulate the return value of the function.
This is fine for some purposes, and completely unacceptable for others. But it’s not equivalent to the explicit cleanup and error handling in C.
So some of us occasionally find ourselves in an RAII language and using goto.
> RAII simplifies some of the resource cleanup, but at a cost: if the resource cleanup fails, there’s essentially no way to convey this.
RAII at least provides a decent default. After all, most resource cleanup cannot fail, or you cannot do much other than print an error. Now, if you need to catch errors during a particular cleanup, you can still do it manually, but RAII lets you focus on these few cases.
You can, and your program will call std::terminate if there’s already an exception being processed. Not exactly desirable if you’re trying to write code that ensures careful resource cleanup.
Also why it’s widely regarded as _wrong_ to ever throw in a destructor.
IMO this is a design bug in C++. The authors couldn't agree on what to do in the exception-during-unwind scenario, so they chose the worst possible option: crash.
In most cases, an second exception raised while another exception is already being thrown is merely a side-effect of the first exception, and can probably safely be ignored. If the idea of throwing away a secondary exception makes you uncomfortable, then another possible solution might have been to allow secondary exceptions to be "attached" to the primary exception, like `std::exception::secondary()` could return an array of secondary exceptions that were caught. Obviously there's some API design thought needed here but it's not an unsolvable problem.
If we could just change C++ to work this way, then throwing destructors would be no problem, it seems? So this seems like a C++-specific problem, not fundamental to RAII.
That said, there is another camp which argues that it fundamentally doesn't make sense for teardown of resources to raise errors. I don't think you're in this camp, since you were arguing the opposite up-thread. I'm not in that camp either.
> If the idea of throwing away a secondary exception makes you uncomfortable, then another possible solution might have been to allow secondary exceptions to be "attached" to the primary exception, like `std::exception::secondary()` could return an array of secondary exceptions that were caught.
Java has that for its pseudo-RAII "try-with-resources" statement: when an exception happens during cleanup of a try-with-resources statement, and the cleanup was because of an exception (instead of normally leaving the block), the inner exception is added to a "suppressed" list in the outer exception. Java exceptions have, since Java 7 (which added try-with-resources), both a "cause" field (for the exception which caused that exception, this exists since Java 4) and a "suppressed" field (which records the exceptions suppressed while cleaning up that exception).
I agree with your points about Java. I have direct experience with suppressed exceptions in Java 6 -- it was painful to debug ("where did my exception go???"). However, this works because Java forces everything thrown to be a sub-class of Throwable. (Please correct me if wrong.) C++ allows you to throw anything, including (bizarrely) null. I learned recently that C# allows the same -- you can throw null(!). How does C# handle suppressed exceptions?
In C#, if the finally-block of a try-finally throws, it replaces the current exception altogether; and using-statement desugars into try-finally.
And C# does not actually allow you to throw null. It does allow you to write "throw x" where x may be null, but that will just cause an immediate NullReferenceException at runtime.
Even if the standard consolidated on one way or another to pack up secondary exceptions (or discard them) how likely is the calling code going to be able to handle and recover from this case?
I am personally on team crash - I would rather my program exited and restarts in a known state then being in some weird and hard to replicate configuration.
So, I personally prefer to use exceptions for "panic" scenarios, like assertion failures, where the application has hit a state it doesn't expect and cannot handle.
Crashing makes sense in these scenarios if the application is only doing one thing. But I am usually working on multi-user servers. I would rather fail out the current request, but allow concurrent requests from other clients to continue.
Yes, I understand the argument: "But if something unexpected happened, your application could be left in a bad state that causes other requests to fail too. It's better to crash and come back clean."
This is not my experience in practice. In my experience, bad states that actually poison the application for other requests are extraordinarily rare. The vast, vast majority of exceptions only affect the current request and failing out that request is all that is necessary. Taking down the whole process is not remotely worth it.
Moreover, crashing on assertions has the unintended consequence of making programmers afraid to write assertions. In a past life, when I worked on C++ servers at Google, assertion failures would crash the process. In this argument, I saw some people argue that you should not use assertions in your code at all! Some argued for writing checks that would log an error and then return some sort of reasonable default that would allow the program to continue. In my opinion, this is an awful place to end up. Liberal use of asserts makes code better by catching problems, making the developer aware of them, and avoiding producing garbage output when something goes wrong.
> Moreover, crashing on assertions has the unintended consequence of making programmers afraid to write assertions. In a past life, when I worked on C++ servers...
The client side rendition of this philosophy exists too. Some client engineers consider it bad form to allow the user to see the application crash. So much so that they'll actually advocate for harmful things like littering the codebase with default values so that when something bad happens the application just keeps on chugging along in a state that nobody every accounted for because doing who knows what to the user's data because they hid errors in default values. It's really really sloppy.
I am definitely team let the user see the crash. Then they know something went wrong, can be alert, and try again in needed. They can report the problem so the devs are aware or the dev's crash tooling will automatically do it. And, ultimately, the issue will get fixed.
(The original version of this philosophy was probably "don't let the user see the app crash, handle the error properly, showing something helpful to the user if necessary, instead". But when adopted by time-constrained product engineering teams, sadly nobody cares about properly handling error states.)
> Even if the standard consolidated on one way or another to pack up secondary exceptions (or discard them) how likely is the calling code going to be able to handle and recover from this case?
Not unlikely. Sometimes your unwind involves cleaning up things that throw for the same reason as the original failure - e.g. failure to communicate with some piece of hardware. But you still try going through that unwind, right? Eventually you leave the context of accessing your hardware device entirely and are back to just working with system memory, the standard streams and some files, which would probably work fine.
I have recently experienced this writing wrappers for the CUDA API for GPU programming.
Sort of, but nested_exception covers a different scenario. With nested_exception, the "attachment" is an exception which caused the exception it is attached to. In the scenario I'm talking about, the "attachment" is an exception which was caused by the exception it is attached to.
Anyway, the key missing thing is not so much the exception representation, but the ability to have custom handling of what to do when an exception is thrown during unwind. Today, it goes straight to std::terminate(). You can customize the terminate handler, but it is required to end the process.
If you need fallible cleanup, but also to try cleanup via RAII, it isn't hard to have a "cleanup" method that signals whether it succeeded, and an "already cleaned up" boolean member the destructor checks.
Even with RAII in C++, goto is still useful for handling the occasional error case where you need to reset/retry some operation e.g. due to transient hardware issues that are not unrecoverable errors. A common example that comes to mind immediately is asynchronous disk reads where the data was corrupted during transfer but may succeed if transparently cleaned up and re-issued.
I use goto rarely. But there are times when anything else would be inelegant, and in those instances I'll use it without hesitation.
> [0] C++, Rust, Go, or anything else with automatic memory management and destructors
Go's automatic memory management (GC) doesn't come in to play here, but the defer statement does. It doesn't make Go a RAII language, but it makes Go a language with a nicer "run this code at the end of the stack frame" feature than goto.
> In RAII languages[0], you obviously don't need unrestricted gotos.
You don't need them for teardown, but they still make sense for retries -- cases where a procedure needs to start over after hitting certain branches, e.g. a transaction conflict. I think `goto retry` is a lot more readable than wrapping the procedure in `do { ... } while (false)` and using `continue` to retry.
`goto` works very nicely with RAII here in that it'll invoke the destructors of any local variables that weren't declared yet at the point being jumped back to.
I find that normally if I need nested break then it suffices to refactor the target loop into a function and use return instead.
I don't think I normally miss multilevel continue, but the same strategy would work for that too. You'd just pull out the target loop body rather than the whole loop.
If that doesn't work because you need to select too many different break/continue levels (more than two) then maybe it's time to review the complexity of the function anyway.
> You'd just pull out the target loop body rather than the whole loop.
that's a "good solution" in the sense of "lambda is the ultimate goto", but if you're writing a loop over the rows and columns (or more dimensions) of something, pulling out and segregating the control structure for the innermost (and potentially other) layers of the hierarchy can make a simple depth first exploration look obscure. If the total amount of code is fitting in a screenful, I'd rather put multi-level break or continue labels and then goto them sparingly.
That sounds like a case where some sort of iterator class would be best (or, in C, an iterator "class" made of a struct with associated functions). It could have helper methods for .next_column() etc., and you'd just have one overall loop.
That might make the code more complicated, overall, but that's the trade off of structured programming - occasionally there's more complexity but it's so exceptionally rare that it's still worth it overall. (Then again ... perhaps it would make the code arguably simpler anyway.)
tbh I see FAR more range-var-misuse with Go loops than defers. I've seen over 100 range-var problems, and seen lints catch many more (several thousand), but I've only seen a loop defer issue once. When a defer is needed, it seems like people both remember the issue better (`defer` is a new construct for many, `for` is not and habits from other languages can mislead them), and the code is complex enough to justify a helper func, where a defer is trivially correct.
I find that after writing a lot of Go, manually having to defer/scope(exit) is a lot more error prone than just RAII destructors: it’s impossible to forget to defer the destructor.
That seems like an inside-out way of doing it to me. I would schedule work onto the transaction struct and make it ultimately responsible for if it should roll back the work or keep it committed.
let mut tx = Transaction::new();
dofoo(&mut tx)?;
dobar(&mut tx)?;
tx.commit();
There is some overhead to boxing the rollback functions for dofoo/dobar into the transaction object, but it's far less error prone (or maybe you can avoid the boxing by encoding all the rollback operations at the type level: less ergonomic but not by much).
I have a distaste for all of these examples, which comes from the existence of a side-effecting operation: calling do_something() necessitates the need to call a cleanup function, which means there's some state being changed but hidden behind the internals of these methods. It is really easy to call this incorrectly which says to me it's just a badly-designed API.
In C# the idiomatic way would be to have each of these 3 things be defined in a class using IDisposable, which is similar to D's scope() -- the declaring class gets a cleanup method when the variable goes out of scope, no matter how that happens.
I assume there's some interaction between these classes, but IMHO that should be explicitly defined and so the code would look something like:
public void foo(int bar) {
using var something = new Something(bar)
if (something.do()) {
using var stuff = new Stuff(bar);
if (stuff.init()) {
using var stuff2 = new Stuff2(bar); // two "stuff"s looks dumb but this is example code
if (stuff2.prepare()) {
return do_the_thing(something, stuff, stuff2, bar);
}
}
}
return null;
}
There's actually several ways to structure this code which would result in something that looks better than the above, but being example code and not knowing how `something` and `stuff` interact, it's hard to write this nicely. I'd probably aim for something much more concise like:
public void foo(int bar) {
using var something = new Something(bar);
using var stuff = new Stuff(something);
using var stuff2 = new Stuff2(stuff);
return stuff2.prepare() ? do_the_thing(stuff2) : null;
}
In the above, I assume stuff2.prepare() calls everything it needs to on the dependent objects, but how I'd structure this for real entirely depends on what they're actually doing.
In C# it's customary just to wave away the worst kinds of problem that C and D developers try to handle, and let the runtime kill your program. (This is more an artifact of why people pick their languages than anything inherent on the languages themselves.)
But rest assured, your C# code is full of global state hidden on its runtime and is subject to the same kinds of errors people are discussing here.
D does support RAII. But RAII has a problem: If you want transactions A and B to be both successful, or both are unwound, using RAII is a clumsy technique. It gets much worse if you need A, B and C to either all succeed or all fail. This article goes into detail:
The thought occurred to me the other day that assembly and BASIC share a lot of similarities in how you need to think of your program's flow, yet we ended with a world where assembly is considered respectable while BASIC basically (pardon) got burnt at the stake.
I've been reading a lot about retro gaming lately, so I'm just thinking in the context of bedroom coders of the 80's that learned to program in BASIC, then moved on to assembly to get more performance, and then later moved on to C and C++ as projects became more complicated. They all seem to have turned out okay.
I suppose what I'm getting at is you can write bad code in any language.
I was brought up on the GOTO statement in Fortran before the GOTO police outlawed it, so I'm quite familiar with its operation.
Nowadays there's more consideration given to structured programming and that's a good thing but that doesn't necessarily mean that GOTO should never be used—and if it is then it doesn't mean the whole structure of one's program ought to be called into question.
No doubt GOTO can be dangerous and can lead one into bad habits but in certain instances it can simplify code and make it less prone to introduced bugs. Modern coding practice teaches us to recognize and avoid spaghetti code so with those constraints in a programmer's mind he/she should be able to use GOTO effectively and with safety.
The key issue is to know when it's appropriate to use it and when not to.
The problem with goto is that it is an unbelievably primitive operator. It can be used to implement any logic at all, and therefore it does not express any logic very clearly. It's a bad way to express intent in code. Aside from the exceptions discussed in the article, there is always a better, clearer way to express logic than to use goto statements.
The same is true of while loops. Aside from a few cases where they are required, they are always better rewritten with a less primitive operator (for, etc). The arguments that programmers today make in defense of while are quite similar to the arguments programmers used to make in defense of goto.
The problem with this is it requires introducing a vast zoo of features distributed across the spectrum of power, and it is not at clear that it is actually easier to learn this whole zoo so you can select precisely the least powerful point on it, than it is to understand the use of a single (or small number of) all-powerful constructs within its context.
My goal is “most reliable and concise for experts,” because we should spend most of our careers as experts, and “easy to learn” requires bad tradeoffs too often. Learning common names and reusing tested implementations pays off over rolling my own on the spot and forcing everyone else to re-read it.
This is false. Tail recursion is clearer and easier to reason about than loops, and tail recursion can be rewritten into gotos. I mean rewritten in a direct way, where the shape of the logic is the same.
E.g. even an odd problem. Let's use GNU C with local functions:
Using goto: achieved by a mechanical transformation involving just some local edits:
bool even(int x)
{
goto start;
even:
{
if (x == 0)
return true;
else
{
x = x - 1;
goto odd;
}
}
odd:
{
if (x == 0)
return false;
else
{
x = x - 1;
goto even;
}
}
start: goto even;
}
Every tail-called local function just becomes a block headed by a goto label. The tail call is replaced by assigning a new value to every argument variable and performing a goto. Someone who is briefed on the approach here can easily see the original tail recursion and maintain the code in such a way that the tail recursion could always be recovered from it.
There was a discussion several years ago in comp.lang.c where a problem was proposed: using whatever approach you see fit, write a C program which strips comments from C code, but preserves everything, including preprocesor directives. Something like that. The person who proposed the problem refrained from posting his solution for several days. He used tail recursion for the entire state machine of the thing (even avoiding if statements; all the cases in the tail functions were handled by the ternary ?: operator).
Others used structured programming: nested loops and such. My solution used goto.
I argued that the goto solution had all the good properties of the superior tail calling solution.
I then supported my argument by writing a text filter which converted that person's tail call program into one with a big function containing goto blocks (compiling and producing the same result and all). A reverse filter would be possible also.
I believe that we can take any mess of a goto graph, divide it into the labeled nodes, round up the variable and everything being done to them and express it as tail recursion. Ironically, the one thing that will make it a bit harder is structured control flow constructs like while, for, switch and what not, where we may have to rewrite those to explicit goto first! E.g. if we look at a while loop, it's like a tail call, but one which is invisible. The end of the while loop body invisibly tail calls to the start, which is bad for understanding.
The thing that will detract from the ability to understand the tail call graph is excessive parameters. In the worst case, every tail function will have to take all of the state variablews as parameters, and pass them all to the next tail function (except for altering some of them). There is a pass that can be done over that to reduce some of these. Like if some tail function foo(a, b, c, d, e, f) doesn't do anything wiht c d e f other than pass it to children and none of those children do anything with those variables (transitively), we can cull those parameters from foo and all the children. This is the hard thing to understand in goto graphs: which of the numerous state variables are relevant to where the goto is going?
Some state vars can be replicated and localized. E.g. in our even() example, we can do this:
Now we no longer have the same variable on both sides of an assignment. Each block works with its private parameter variable. The other block only every assigns to that variable when simulating parameter passing: e.g. the odd block assigns to even's x0 just before goto even.
We can start with a goto graph and make incremental improvements like this and recover a tail call graph. We can then try to understand what the tail functions mean in terms of recursion and document that.
> Tail recursion is clearer and easier to reason about than loops
tail recursion is a complex and subtle expression of control flow that requires substantial background knowledge to be able to even understand, much less reason about based on code on a page
for loops are immediately intuitive to anyone, even without any programming training
no idea how you can come to this conclusion. just ain't so
> tail recursion is a complex and subtle expression of control flow
That is simply nonsense; it's just function application, ideally without having to think about state (or as little state as possible).
> for loops are immediately intuitive to anyone, for loops are immediately intuitive to anyone
That's just hand-waving without some sort of psychological data. Even if it were true, it would not be relevant because you can't just hand over software maintenance to just "anyone" pulled off the street who finds some language feature intuitive. (In my anecdotal experience, on the contrary, non-programmers have a very poor intuition for the idea of changing variables by assignment and what that means when a backwards jump takes place.)
A solution expressed recursively is demonstrably, quantifiably easier to reason about informally or prove correct than loops and state variables. Of course, not to "anyone" with no programming background just pulled off the street, but to people who have the understanding and skills. There is simply less proof material. An entire body of proof techniques that are required with imperative loops are absent. You just use straightforward inductive reasoning instead of pre and post conditions over stateful variables, and loop invariants and whatnot.
Anyway the idea that for loops are somehow easier than recursion is definitely not from mainstream CS; just some fringe view fr
That only shows you haven't known any programmers who had a broad exposure to the topics of their supposed craft; it doesn't speak to the actual topic itself.
Recursive solutions being easier to verify is quantifiable. This is not some popularity poll.
Some people are not well-versed in some techniques. Recursion is not always well supported in programming languages. In standard C if we want to use recursion, we will have to write multiple standalone functions that have their own scopes, whereas if we put together several loops, we can have those all in the same scope, with convenient access to common local variables. That could make a decisive difference. Not all languages have tail calls; what looks like tail recursion can "blow the stack".
When I say that it's "easier" I don't mean that anyone of any skill level and background can more easily design and implement a recursive solution for any problem, and in any language. Rather, that when the recursive solution is discovered, it is easier to convince oneself that it is correct: that it's handling all the cases and terminates, with the correct value.
the only programmers who are concerned with proofs are located in universities and are writing theses, which are statistically 0% of programmers overall
programs are recipes, not proofs. "for 10 times, do this" is in almost all cases trivially easier to understand and maintain than a recursive alternative. this isn't controversial in any way
Anybody that dogmatically avoids something like a cult probably doesn't practice the holistic thinking necessarily to design large systems.
People mocking the use of goto is a bozo bit switch for me. It can switch back, but offering pithy out of context absolutisms and trying to pass it off as wisdom is a hard point to recover from.
It's almost always in one's interest to play dumb and not be sure of anything. That's what I see the smart people do.
As a kid I once saw a quicksort implemented in BASIC and it looked like magic to me. I thought how could anyone come up with this algorithm….
Then I saw it implemented recursively in lisp…it was the simplest most obvious choice to make, and it was hard to imagine anything else.
Now I think that the BASIC implementation must have been a translation from the lisp or something equivalent and it would have been very unlikely that any native BASIC programmer would have come up with it on their own. Of course it used GOTOs.
With the proper abstraction level any complex algorithm becomes trivial.
The state machine example is definitely a very fitting use of goto, but it reminds me of another thing that seems to have become a rare skill but is very useful: flowcharting. Besides making people comfortable with goto in general, it also helps visualise control flow in ways that a lot of programmers these days don't realise, and it's unfortunate that a lot of courses seem to have omitted its teaching.
And here Microsoft provides us with lovely example of such ridiculous nesting.
That's a very memorable example, but ultimately the true cause of that monstrosity is a clearly stupid API design; this is the API for a file picker, the recommended replacement for an existing one that they wanted to deprecate. In the existing one, you fill in a structure and call a single function with a pointer to it. In its replacement, you need to call a dozen methods on an object, and check for "possible" errors on each call, even if probably 99% of them only do things like assign to a field in a now-opaque structure and can never produce an error. Then the example code must've been edited by someone with severe gotophobia. (Not all MS code is like that --- they have plenty of other example code that uses goto, e.g.: https://github.com/microsoft/Windows-driver-samples/blob/mai... ) The existing API was even extensible, since it used a structure with a size field that could differentiate between different versions and extensions, but they didn't.
I'm browsing this, and I'm not seeing the way I do it, which is sort-of-like #5 but not quite... I tend to wrap the code that has multiple exits-to-label in a do...while(0) loop, and use break to get there...
So it might look like:
do {
if (false == call_func1()) {
cleanup_any_state();
break;
}
if (false == call_func2()) {
cleanup_any_state();
break;
}
} while (0);
At any point you can branch to the common exit-statement, and keep on testing for failure as you go through the algorithm without indenting forever.
Generally there's no too much state to clean up in the code I've been using this in, but obviously later 'break' conditions would have to clean up the state for earlier ones too. That's easy to abstract into functions for the cleanup though.
I don't like the code pattern if ( false == func() ). Function func() already returns a boolean, no need to compare that to a second boolean (false or true) to generate a third boolean.
It's there for illustration, that's all. I don't actually write code like that, but I want to make sure that you understand the condition in the code snippet.
So the sort of stuff I've been using this in is encryption/decryption, where there's a whole boatload of things you have to set up, read/create OIDs, configure identities, fetch certificates, match algorithms, etc. etc.
All of that has to be ok before you finally get to the bit that does the work, and since I'm using ObjC with its ARC feature, I don't need to deallocate anything, they'll be deallocated as they go out of scope. I tend to release critical RAII stuff in my -dealloc method anyway if there's anything there to be done.
So it's really a whole long list of
id result = nil;
do {
if (setup-X-fails)
break
if (setup-Y-fails)
break
...
result = call_method(X,Y,Z,A,...F)
} while (0);
... which works out pretty well, and is very readable. The setup-xxx stuff can be several pages of code for each method - and useful in their own right, so integrating it into the loop doesn't seem preferable.
Manual goto cleanup is such a busywork, adding nothing of value, only places for potential leaks and UAFs.
I know for C it’s unthinkable to standardize such a luxury like defer or destructors, so we’re going to relive arguments from 1968 for as long as C is used.
> I know for C it’s unthinkable to standardize such a luxury like defer or destructors, so we’re going to relive arguments from 1968 for as long as C is used.
There was a proposal for defer in C23 but it didn't make the cut [1]. There is also the __cleanup__ attribute if you're using GCC.
Then I looked up 'callbacks considered harmful'. Hmmm....
It's probably easier for an IDE to track down a function if you want to see how the code works, however, and the function might have some useful explanatory comments.
Suggested article: "People who claim code can explain itself considered harmful"
int foo(int v) {
// ...
int something = 0;
switch (v) {
case FIRST_CASE: something = 2; goto common1;
case SECOND_CASE: something = 7; goto common1;
case THIRD_CASE: something = 9; goto common1;
common1:
/* code common to FIRST, SECOND and THIRD cases */
break;
case FOURTH_CASE: something = 10; goto common2;
case FIFTH_CASE: something = 42; goto common2;
common2:
/* code common to FOURTH and FIFTH cases */
break;
}
}
The D version:
int foo(int v) {
// ...
int something = 0;
void common1() {
/* code common to FIRST, SECOND and THIRD cases */
}
void common2() {
/* code common to FOURTH and FIFTH cases */
}
switch (v) {
case FIRST_CASE: something = 2; common1(); break;
case SECOND_CASE: something = 7; common1(); break;
case THIRD_CASE: something = 9; common1(); break;
case FOURTH_CASE: something = 10; common2(); break;
case FIFTH_CASE: something = 42; common2(); break;
default: break;
}
}
Note the use of nested functions to factor out common code. The nested functions usually get inlined by the compiler, so there is no cost to them. Nested functions are a great way to eliminate gotos without penalty.
> The nested functions usually get inlined by the compiler, so there is no cost to them.
This kind of code typically gets written when ‘usually’ isn’t good enough (of course, once you use a compiler, in theory, there are no guarantees; the compiler could compile the inlined-function one with a goto or vice versa, but programmers typically are more concerned about what happens in practice)
The inlined functions also may increase code size and instruction cache pressure.
On the other hand, having a branch less may be beneficial.
I agree that programmers care more about what actually happens (and they should, when performance matters!), but this kind of analysis also involves future changes to the compiler unless it's a one-time job. Which sometimes exists, so check the assembly there and do whatever you need.
Straightforward and limited-scope code like nested functions tends to improve in performance over time, because it restricts possibilities better than goto. And it's more error-resistant to future changes for similar reasons. If your code has to last a while, you're probably better off having the safer one. Or maintain both, and use the safer one to validate the unsafe one, and choose based on benchmarks of the week - what was true when it was written could change with any version.
I actually prefer "goto-less alternative 2". It's more verbose, but more explicit. No magic, I know exactly what happen. If you suddenly have more functions, you should have an array of functions with a clean_up_level variable.
Of course, you probably should not have a global state that some clean up function with a side effect deals with in the first place, but it's the c linux kernel so I assume there is something I don't know.
Not in the Dijkstra "Go to statement considered harmful" sense. Nor are the goto keywords found in most modern languages. These retain structure and thus are not considered harmful.
But there is a good case to be made that exception handlers are gotos in the Dijkstra sense, at least when used for anything other than exceptions, like passing errors around.
If you read Dijkstras letter it is clear that early returns (ie any return which is not the last statement in a function) is subject to the same criticism as goto.
> These retain structure and this are not considered harmful
This might be your opinion, and it is a very reasonable opionin. But it is just not what Dijkstra is arguing. He is very clearly arguing for “single entry single exit”.
> He is very clearly arguing for “single entry single exit”.
Agreed. Return forces you into a single exit. Upon hitting return, the code can only return back to where the function was originally called. It cannot 'arbitrarily' jump to some other place in code as you could do in an unstructured programming language like, say, BASIC. Which is what Dijkstra was pushing for, being a strong proponent of structured programming.
I don't know of any modern programming language that does allow anything outside of a single exit, exception handlers and setjmp/longjmp excepted. There is a good case to be made that the latter two reintroduce the very problem Dijkstra warned of and are generally considered harmful for the same reason.
Dijkstra is not just arguing all exits should return to the same point, but also that you shouldn’t enter or exit in the middle of a block. Execution should consist of executing blocks zero or more times, but either fully or not at all.
Exiting in the middle of a block would be just as bad as entering in the middle, according to the argument he is making.
You certainly wouldn't be the first to hold that view.
However, Dijkstra accepts abortion clauses, which is what return really is (it is not an exit clause). My take is that his argument is that an unbridled go to is too primitive and that he believed go to statements should be bridled by additional structure that help describe the process, not that go to should be avoided entirely.
While I think we can agree that return is go to, it is a bridled go to. It strictly limits what a programmer can do, avoiding the mess Dijkstra claims an unbridled go to promotes. It is predictable and understandable, clearly describing the intent.
Exception handlers have no such strictness. It is not clear, without studying the program in its entirety, where your code will end up. That can be a good tradeoff when you are dealing with exceptions. The only reasonable response to encountering an exception is to ultimately crash, so at that point who cares? But, indeed, using exception handlers for control flow (e.g. passing errors around) is considered harmful.
It’s interesting to think that when I wrote DVIView (TeX DVI previewer running on VM/CMS) in Pascal/Web in 1987, I found goto to be an absolute necessity (although part of that was doubtless because the relevant logic was closely modeled on Knuth’s DVItype which also used goto. I would have a hard time coming up with the last time that I needed a goto since then (or, for that matter, labeled break/continue).
Recently I've had an argument with somebody bragging about they managed to bypass goto with writing following mess:
do {
r = allocate_resource();
if (!r)
break;
} while (false);
When I've said that this is just a messy unreadable version of goto, I was told that goto is harmful and this is a structured programming, which is superior. GOTOphobia is real.
It's not GOTOphobia. It's people blindly doing what others are telling them to do (or not to do) without the ability to critically think about what they're being told.
Lots of people just follow whatever anyone, who they perceive as authority, says. Critical thinking isn't a common trait.
This person is living in a pretend bubble that isn't grounded in the reality of large projects, multiple team members, deadlines, changing requirements, etc.
No programmer is perfect. And when your tool can cut your arm off, you should be careful or route around the dangerous bits when possible.
I see this "if you're good then you don't need safety" mentality in a lot of conversations with C programmers about programming languages.
Maybe it's some kind of defencive response against newer programing languages slowly eating up spaces C used to be dominant in (like command line tools and system daemons), maybe it's just the programmer saying this thinking they're that special. Either way, most people saying this are setting themselves up for failure.
I wouldn't start a new project in C unless I absolutely have to but if you disagree, you can at least admit that there are dangers in C that you need help with if you want to be sure you're doing everything right. With most warnings treated as errors, extended warnings enabled, linting to make things like missing brackets obvious, static analysis of all code to spot difficult bugs, automated dynamic analysis of test cases and a proper testing pipeline I believe you can write C code that's safe enough.
In this case, I'm willing to give the authors the benefit of the doubt because goto hatred is worse than the risk posed by goto in most settings. Gotos used right are fancy if/switch statements and avoiding them in C can lead to a mess that doesn't add much safety. Most examples given are better in my opinion, because using goto can replicate the code flow modern languages provide with things like when/match/defer keywords.
Nah, it is like that since C exists, they used to call those of us that rather use languages from ALGOL systems programming linage as coding with straighjacket, or nany languages.
I learned to appreciate C++ RAII already on Turbo C++ for MS-DOS around 1993, and using raw C has always been because I was required to deliver C code for some university projects or work.
This analogy implies that all other tools are 1000% safer, but they are not. In C it’s like pointing to a dusty corner behind a table in a room full of dirt.
When was the last time you did “cut your arm” with goto specifically? What’s the count and time ratio to other issues? Were these also addressed as taboo or left as “experience earned”? Gotophobia in its largest part is just a stupid meme with no real world data.
30% safer, 30% more readable, and 30% more productive would be even better.
> When was the last time you did “cut your arm” with goto specifically?
It's been a while since I've used C, and even longer since I've personally written goto statements. I do remember frequently getting tripped up on them right after undergrad. It's not friendly, and I don't ever wish to touch them again.
I'm working in a C++ game engine project right now and it's constantly segfaulting. I can't imagine that setting register jumps manually in complex higher level code would improve my situation.
When I get to choose the language, I use Rust. It fits the C use case and fixes many of the warts.
I do remember frequently getting tripped up on them right after undergrad. It's not friendly, and I don't ever wish to touch them again.
So it’s something bad from the undergrad past, no details. Must we take an advice based on that? I’m not sure I will.
I'm working in a C++ game engine project right now and it's constantly segfaulting. I can't imagine that setting register jumps manually in complex higher level code would improve my situation.
Neither would tabooing something based on weak or no evidence. “It doesn’t help here” and “we ban it and ostracize its use” are two different claims.
I always put a goto in all of my code just to mess with the "goto considered harmful" people. Most people don't notice but every once in a while I find someone who can't make it past the goto and it puts a smile on my face. =)
I had a debate at work about whether guards are considered spaghetti code. The same person also thought that you should never use GOTO's. He learned these rules in his comp-sci classes and never questioned them; even after years in the real world.
Personally, I use guards heavily to reduce nesting and if the language supports goto's, I'll use them if it make sense to improve code flow. However, its been a very long time since I've needed to use goto :)
I find it similar to the MISRA rules that says there should only be a single return in a function.
We make an exception for this at work for checks at the start of a function (eg. NULL, ranges, etc.) and it tends to save an indentation or two. Then generally still follow the single return rule otherwise.
I wonder if the GOTO fear is, in part, because in older languages, GOTO used to point to a line number and not a labelled block. The former seems horrifyingly brittle while the latter seems like it could possibly be useful.
“Goto fail” would have been wrong in a non-goto language that use RAII.
The problem was not goto, it was that the else path was also “give up”. In a non goto/RAII/defer language it would have been something like
If (error)
Return
Return
My assumption has been this was some kind of merge error rather being wrong off the bay. Interestingly mandatory indenting or mandatory braces might have stopped this, but then i would have thought -Werror with the dead code warnings would have as well :-/
But again the error was not the goto, and believing it was is the exact problem the article is talking about: people are so opposed to goto they are unable to see real issues (I have seen C code that tries to avoid goto completely and error handling code becomes horrific, far more complex, and far more error prone that just using goto). The problem with goto is that it is very easy to use it unnecessarily, in ways that complicated control flow but don’t actually make things better.
For the curious this flag has been available since GCC 6.1 (2016) and Clang 10 (2020). It is enabled by -Wall for both compilers.
Interestingly (or perhaps coming full circle), the bug reference by the GGP comment is also called out in the GCC 6.1 release notes:
> -Wmisleading-indentation warns about places where the indentation of the code gives a misleading idea of the block structure of the code to a human reader. For example, given CVE-2014-1266
sslKeyExchange.c: In function 'SSLVerifySignedServerKeyExchange':
sslKeyExchange.c:629:3: warning: this 'if' clause does not guard... [-
Wmisleading-indentation]
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
^~
sslKeyExchange.c:631:5: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'if'
goto fail;
^~~~
Lack of code review and testing always seemed like the most pressing problems with that; both of which should have caught this error. Yes, it probably also should have had braces, but that seems like the lesser issue.
That error is not due to goto, it was just a goto which was errously executed because of badly formatted code. (It looked like the statement was inside an if-block due to the indent.)
Pyhon would have prevented this bug, but so would a formatter. Rust also requires braces for if-blocks to prevent this kind of error.
Using a language feature that is a known footgun (`if ...` instead of `if {...}`) without being cautious enough to avoid shooting yourself in the foot is not the fault of the footgun, it's the fault of the programmer.
Additionally, in the above linked case, the problem isn't a misused `goto`, it's a misused `if ...`. It would be just as problematic if they typed `cleanup_context();` instead of `goto fail;`, but nobody complains about cleaning up state, do they?
that one in particular was not really caused by goto, but rather by braceless if statements, it'd be a vulnerability all the same if the line was a "fail" function that was called instead of a goto.
C# (and Go) have adjusted goto to ensure consistent scoping, avoiding undefined variables, and keeping control flow reducible. So its a much safer and better form of goto than exposed in C/C++, where you can still do weird things without being warned by the language
The best analogy I ever heard on this is that using a goto is like knocking a hole into a wall: it can be very useful, or even essential, in some specific circumstances, but you should give it a bit of thought. Also, it would be foolish to swear off ever allowing either.
This blogpost is horrible! The title is good, you can tell if someone actually uses C at a decent level based off of if they describe SESE(single exit single entry) and how you use goto's to achieve that.
BUT, the fact they have multiple goto locations in one function violates this! Only one goto locations ! That goto is goto cleanup, or goto exit. What you do is then check state of each variable you cleanup. Every function should be some variable of this. If anyone writes C in any other style than SESE, you can consider them a subpar C programmer. There's variations like using BOOL and in and out variables. I like them, but there are different styles. But anyone not using a single AND ONLY A SINGLE goto in every function is 100% a subpar C programmer who you should not trust.
Would you say that the Linux kernel is written by mostly "100% subpar C programmers"? Because it's an extremely common pattern to have multiple goto labels at the end of a function.
Yes. There's a reason pretty much every secure C coding standard dictates exact what I said, like CERT C etc. There's a reason they have weird bugs. Just because it's an impressive piece of software, doesn't mean it can't have horrible design pattern written by substandard coders. And in an open source project with as many contributors as Linux, I would say it's not hard to fathom that there's a significant number of substandard people writing code on that codebase. Even MISRA quoted in the article I believe intends that you only have one goto location.
I'm not saying not to use goto. The above example works on any C language, with some tweaks needed to K&R. I've done substantial Kernel work and can tell you that there's no reason to ever break my example and put multiple goto stubs. Can you provide a single situation where it is needed and there's no other alternative? I can't prove the negative you want me to.
I don't see how the number of goto's is relevant. You're still having alot of goto's in each function in the codebase with SESE and only using one goto location, solely for cleanup and exit.
a more interesting discussion between your point and his would be to show simple examples where each of the views break down. When you're dealing with allocation or handle cleanup, SESE sounds good to me. But with multilevel loop break or continues, even observing SESE I can see room for more gotos. But I don't know what either you or he are talking about.