Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting paper.

The following code is probably my favorite, "What will this program output?" example. This is taken from the Quake III Arena code in "q_math.c" [1].

Note line 561 [2]. Non-obfuscated code, and one is left wondering... just what, exactly, is going on at that line?

I understand that the point of the paper is to analyze code without having comments to help, but it serves as a reminder to me that commenting is important in helping not only other developers understand the code, but to help myself when revisiting code particulars that may have faded from memory.

I find this to be a great example when I hear, "I don't comment; the code itself is self-documenting."

  552 float Q_rsqrt( float number )
  553 {
  554     long i;
  555     float x2, y;
  556     const float threehalfs = 1.5F;
  557
  558     x2 = number * 0.5F;
  559     y = number;
  560     i = * ( long * ) &y; // evil floating point bit level hacking
  561     i = 0x5f3759df - ( i >> 1 ); // what the fuck?
  562     y = * ( float * ) &i;
  563     y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
  564 //  y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
  565
  566 #ifndef Q3_VM
  567 #ifdef __linux__
  568     assert( !isnan(y) ); // bk010122 - FPE?
  569 #endif
  570 #endif
  571     return y;
  572 }
[1] https://github.com/id-Software/Quake-III-Arena/blob/master/c...

[2] https://en.wikipedia.org/wiki/Fast_inverse_square_root



I would probably write it this way today (well, actually I wouldn’t write it today, I would instead use the reciprocal square-root instructions provided by all the common architectures, but let’s pretend):

  static uint32_t toRepresentation(float x) {
    uint32_t rep;
    memcpy(&rep, &x, sizeof x);
    return rep;
  }

  static float fromRepresentation(uint32_t rep) {
    float x;
    memcpy(&x, &rep, sizeof x);
    return x;
  }

  static float rsqrt_linearApproximation(float x) {
    uint32_t xrep = toRepresentation(x);
    uint32_t yrep = 0x5f3759df - xrep/2;
    return fromRepresentation(yrep);
  }

  static float rsqrt_newtonRaphsonStep(float x, float y) {
    return y*(1.5f - (0.5f*x)*y*y);
  }

  float Q_rsqrt(float x) {
    float y = rsqrt_linearApproximation(x);
    y = rsqrt_newtonRaphsonStep(x, y);
    y = rsqrt_newtonRaphsonStep(x, y);
    return y;
  }
The only semi-cryptic thing about it, to someone versed in numerics, is line with the magic number 0x5f3759df. While I would expect a professional to be able to quickly re-derive it and understand the intention, I would still write a comment, along the lines of “We approximate the floating-point representation of 1/sqrt(x) with a first-order minimax polynomial in the representation of x.”

All that said, I tend to write a lot of comments. Much of the code I write professionally is math library code, and it’s not uncommon for me to have a ratio of several paragraphs of error analysis or correctness proofs to a few lines of code.


It's not just semi-cryptic, it's magic. Your example above includes "xrep/2" instead of i>>1, but this does nothing to increase the understanding of what the fuck is going on here.

This is a floating point number and some deep juju bit twiddling is happening with it, it's not just being divided by 2. The exponent is being effectively divided by 2 and the fractional part of the mantissa is also being effectively divided by 2. Except in the case where the representation of the exponent ended in a 1, in which case that 1 is shifted down into the fractional part of the mantissa. Exactly why this is ok and doesn't hurt the approximation is a subject that could fill a report.


Floating point representations are approximately logarithms. It should be clear to anyone with working knowledge of floating point numerics that dividing by two and negating the representation of a floating point number corresponds to taking the reciprocal square root of the number. That's not magic, that's math.

The constant simply accounts for the bias in the floating point representation and balances the error over the domain of interest. Again, not magic.

There is no "bit-twiddling" at all.


Explaining the details of the bit-twiddles does not make it any less bit-twiddling. And explaining the details behind the opaque steps does not make them any less opaque to the uninitiated.

You're using bit operations on a data type with a non-trivial bit structure. That's pretty much the definition of bit-twiddling. The fact that blunt bit operations on such a finely-defined structure can correspond to useful mathematical operations is non-obvious.


The fast inverse square root trick is clever, but this is right on. It's bypassing the provided interface for FP operations and mucking around with implementation details. That's not to say that that's a bad thing, or that floating point numbers aren't a leaky abstraction, or that the real world doesn't demand this sort of optimization from time to time. But it's still true.


>And explaining the details behind the opaque steps does not make them any less opaque to the uninitiated.

Emm, by definition it does.


> You're using bit operations on a data type with a non-trivial bit structure.

It’s using integer operations on an encoding that has a meaningful (approximate) interpretation as an integer.


I think a lot of people use the "self-documenting" excuse without knowing what "self-documenting" really is, and what it brings to the table. Self-documenting code is a good fit for a reasonably small and straight forward function. If you have a function called get_thingy() in the context of some sort of "thingy container" then you probably don't need to go out of your way to explain "This method gets a thingy from the thingy container."

This can even be expanded to more complex single-purpose functions that are not exposed directly to any external APIs. For instance, if you have a functioned called restart_thread() which first tries to stop a thread, then stats a new one then you might be safe.

However, as soon as you leave the realm of the obvious, comments are a must. Clearly in situations where you're trying something clever, like that Q_rsqrt function, missing comments will boggle all but the most specialized professionals.

Unfortunately people often forget about another important type of comments; those that explain what function this piece of code serves in the program. I've lost count of the number of "Context"s or "Interface"s or "get_data"s I've seen when trying to read someone's code. In those situations a single line like "This context links the rendering pipeline to the physics simulation" would go a very long way.

Instead when I see this sort of code it will either not have any comments at all, or it will be a dissertation that all the theoretical uses of the piece of code (Omitting what it's actually used for in the current context).


# In those situations a single line like "This context links the rendering pipeline to the physics simulation" would go a very long way.

Why not rewrite your code so there is a rendering_pipepline variable and a physics_simulation variable and a function called link? Then the comment is redundant.

That's what refactoring for self-documentedness is to me.


While that is a perfectly reasonable solution to the minimal example I provided, there may be reasons to abstract away the concept.

Personally, I like having some sort of structure in memory that allows me to access and query contextual information. It's amazingly useful if you are trying to link a variety of different elements. In fact it's practically unavoidable if you are writing highly dynamic code. Trying to manage this sort of system with a few variables and functions would become a nightmare.

In fact, those dynamic situations are where comments describing layout are absolutely critical. In these situations you may be using your class as a generalization, and tracking the full extent of this sort of interaction could take a ridiculous amount of time.


Indeed. Document the Whys, and let the code itself document the Whats.


That is a great example. Some code just absolutely needs extra documentation around it. IMO, this is mostly true with more complicated or non-obvious algorithms like your example. There aren't really much more intelligible names you can give those variables; the algorithm just needs them.

Still, I find that with 90% of the code I write for work, it's not so difficult to come up with variable and method names that make it pretty clear what's going on. I take the approach that I'll add comments anywhere I think that naming might not be enough, but that very rarely happens.

Granted, I'm not writing low-level, math heavy code. I think it really just depends.


So beautiful. And yet, sadly, "for every 0x5f3759df we learn about, there are thousands we never see"[1]

[1]: http://www.xkcd.com/664/


I love that one, such a simple yet profound commentary on the life of an engineer.


As I think others have pointed, code that is confusing on an algorithmic level is not the common case. Also, I do not think that 'normal' methods of documentation would work all that well in these cases either. What you would really want as documentation for this type of function is more along the lines of a proof of the algorithm being used. Then you just implement the algorithm in as close a way possible to the construction in the proof and document deviations.

At one point I was browsing through a crypto-library (I forget which one). Almost all of the crypto functions used short variable names, and had only a one line comment. That comment is a citation for the paper that the algorithm came from. Ignoring the fact that the code authors were not the source of the crypto (so putting the explanation in the source may be plagourism or such), source code is not a good format for this type of thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: