The following code is probably my favorite, "What will this program output?" example. This is taken from the Quake III Arena code in "q_math.c" [1].
Note line 561 [2]. Non-obfuscated code, and one is left wondering... just what, exactly, is going on at that line?
I understand that the point of the paper is to analyze code without having comments to help, but it serves as a reminder to me that commenting is important in helping not only other developers understand the code, but to help myself when revisiting code particulars that may have faded from memory.
I find this to be a great example when I hear, "I don't comment; the code itself is self-documenting."
552 float Q_rsqrt( float number )
553 {
554 long i;
555 float x2, y;
556 const float threehalfs = 1.5F;
557
558 x2 = number * 0.5F;
559 y = number;
560 i = * ( long * ) &y; // evil floating point bit level hacking
561 i = 0x5f3759df - ( i >> 1 ); // what the fuck?
562 y = * ( float * ) &i;
563 y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
564 // y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
565
566 #ifndef Q3_VM
567 #ifdef __linux__
568 assert( !isnan(y) ); // bk010122 - FPE?
569 #endif
570 #endif
571 return y;
572 }
I would probably write it this way today (well, actually I wouldn’t write it today, I would instead use the reciprocal square-root instructions provided by all the common architectures, but let’s pretend):
The only semi-cryptic thing about it, to someone versed in numerics, is line with the magic number 0x5f3759df. While I would expect a professional to be able to quickly re-derive it and understand the intention, I would still write a comment, along the lines of “We approximate the floating-point representation of 1/sqrt(x) with a first-order minimax polynomial in the representation of x.”
All that said, I tend to write a lot of comments. Much of the code I write professionally is math library code, and it’s not uncommon for me to have a ratio of several paragraphs of error analysis or correctness proofs to a few lines of code.
It's not just semi-cryptic, it's magic. Your example above includes "xrep/2" instead of i>>1, but this does nothing to increase the understanding of what the fuck is going on here.
This is a floating point number and some deep juju bit twiddling is happening with it, it's not just being divided by 2. The exponent is being effectively divided by 2 and the fractional part of the mantissa is also being effectively divided by 2. Except in the case where the representation of the exponent ended in a 1, in which case that 1 is shifted down into the fractional part of the mantissa. Exactly why this is ok and doesn't hurt the approximation is a subject that could fill a report.
Floating point representations are approximately logarithms. It should be clear to anyone with working knowledge of floating point numerics that dividing by two and negating the representation of a floating point number corresponds to taking the reciprocal square root of the number. That's not magic, that's math.
The constant simply accounts for the bias in the floating point representation and balances the error over the domain of interest. Again, not magic.
Explaining the details of the bit-twiddles does not make it any less bit-twiddling. And explaining the details behind the opaque steps does not make them any less opaque to the uninitiated.
You're using bit operations on a data type with a non-trivial bit structure. That's pretty much the definition of bit-twiddling. The fact that blunt bit operations on such a finely-defined structure can correspond to useful mathematical operations is non-obvious.
The fast inverse square root trick is clever, but this is right on. It's bypassing the provided interface for FP operations and mucking around with implementation details. That's not to say that that's a bad thing, or that floating point numbers aren't a leaky abstraction, or that the real world doesn't demand this sort of optimization from time to time. But it's still true.
I think a lot of people use the "self-documenting" excuse without knowing what "self-documenting" really is, and what it brings to the table. Self-documenting code is a good fit for a reasonably small and straight forward function. If you have a function called get_thingy() in the context of some sort of "thingy container" then you probably don't need to go out of your way to explain "This method gets a thingy from the thingy container."
This can even be expanded to more complex single-purpose functions that are not exposed directly to any external APIs. For instance, if you have a functioned called restart_thread() which first tries to stop a thread, then stats a new one then you might be safe.
However, as soon as you leave the realm of the obvious, comments are a must. Clearly in situations where you're trying something clever, like that Q_rsqrt function, missing comments will boggle all but the most specialized professionals.
Unfortunately people often forget about another important type of comments; those that explain what function this piece of code serves in the program. I've lost count of the number of "Context"s or "Interface"s or "get_data"s I've seen when trying to read someone's code. In those situations a single line like "This context links the rendering pipeline to the physics simulation" would go a very long way.
Instead when I see this sort of code it will either not have any comments at all, or it will be a dissertation that all the theoretical uses of the piece of code (Omitting what it's actually used for in the current context).
# In those situations a single line like "This context links the rendering pipeline to the physics simulation" would go a very long way.
Why not rewrite your code so there is a rendering_pipepline variable and a physics_simulation variable and a function called link? Then the comment is redundant.
That's what refactoring for self-documentedness is to me.
While that is a perfectly reasonable solution to the minimal example I provided, there may be reasons to abstract away the concept.
Personally, I like having some sort of structure in memory that allows me to access and query contextual information. It's amazingly useful if you are trying to link a variety of different elements. In fact it's practically unavoidable if you are writing highly dynamic code. Trying to manage this sort of system with a few variables and functions would become a nightmare.
In fact, those dynamic situations are where comments describing layout are absolutely critical. In these situations you may be using your class as a generalization, and tracking the full extent of this sort of interaction could take a ridiculous amount of time.
That is a great example. Some code just absolutely needs extra documentation around it. IMO, this is mostly true with more complicated or non-obvious algorithms like your example. There aren't really much more intelligible names you can give those variables; the algorithm just needs them.
Still, I find that with 90% of the code I write for work, it's not so difficult to come up with variable and method names that make it pretty clear what's going on. I take the approach that I'll add comments anywhere I think that naming might not be enough, but that very rarely happens.
Granted, I'm not writing low-level, math heavy code. I think it really just depends.
As I think others have pointed, code that is confusing on an algorithmic level is not the common case. Also, I do not think that 'normal' methods of documentation would work all that well in these cases either. What you would really want as documentation for this type of function is more along the lines of a proof of the algorithm being used. Then you just implement the algorithm in as close a way possible to the construction in the proof and document deviations.
At one point I was browsing through a crypto-library (I forget which one). Almost all of the crypto functions used short variable names, and had only a one line comment. That comment is a citation for the paper that the algorithm came from. Ignoring the fact that the code authors were not the source of the crypto (so putting the explanation in the source may be plagourism or such), source code is not a good format for this type of thing.
The following code is probably my favorite, "What will this program output?" example. This is taken from the Quake III Arena code in "q_math.c" [1].
Note line 561 [2]. Non-obfuscated code, and one is left wondering... just what, exactly, is going on at that line?
I understand that the point of the paper is to analyze code without having comments to help, but it serves as a reminder to me that commenting is important in helping not only other developers understand the code, but to help myself when revisiting code particulars that may have faded from memory.
I find this to be a great example when I hear, "I don't comment; the code itself is self-documenting."
[1] https://github.com/id-Software/Quake-III-Arena/blob/master/c...[2] https://en.wikipedia.org/wiki/Fast_inverse_square_root