Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Premature optimization is not really a thing

Okay. I'm going to stop this thread right there and take some opportunity to provide some mentoring. I hope you accept this, as it will help in your career.

Read this paper. It is a classic.

https://pic.plover.com/knuth-GOTO.pdf



You should read the paper a bit more closely.

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3 %. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified"

We know that cache misses are not a small in-efficiency. This has been measured & observed on many real systems as a real, systemic problem. It's why data-oriented design is currently the King of game engine world, because cache misses kill performance. It is not premature to warn against it as a general practice as a result, as that's systemic de-optimization that will likely impact the critical 3%.


I think you may want to read the quote from the paper a bit more carefully. "...he will be wise to look carefully at the critical code; but only after that code has been identified"

I was told that "premature optimization is not really a thing" as a response to a reply I received that pImpls should be avoided at all costs.

When we analyze the performance impact of software, we don't shotgun change things because of a generalized fear of cache misses. We examine critical code paths and make changes there based on profile feedback. That is the spirit of what Knuth is saying in this quote. Look carefully at critical code, BUT ONLY AFTER that code has been identified.

A cache miss is critical when it is in a critical path. So, we write interfaces with this in mind. Compilation time matters, as does runtime performance. Either way, we identify performance bottlenecks as they come up and we optimize them. Avoiding clearer coding style, such as encapsulation, because it MIGHT create faster code, is counter-productive.

We can apply the Pareto Principle to understand that 80% of the performance overhead can be found in 20% of the code. The remaining 80% of the code can use pImpls or could be rewritten in Haskell for all it matters for performance. But, that code still needs to be compiled, and often by clients who only have headers and library files. Subjecting them to long compiles to gain an unimportant improvement in performance in code that only rarely gets called is a bad trade. Spend that time optimizing the parts of the code that matter, which, as Knuth says, should only be done after this code has been identified.

EDIT: the downvoting on this comment is amusing, given that "avoid pImpls" is exactly the sort of 97% cruft that Knuth was addressing.


> EDIT: the downvoting on this comment is amusing, given that "avoid pImpls" is exactly the sort of 97% cruft that Knuth was addressing

Again, no, it isn't. You seem to be severely underestimating the systemic impact of cache misses if you are considering them a "small" impact to efficiency.

It's a well-proven, well-known problem. Ignoring it falls under Knuth's guidance of "A good programmer will not be lulled into complacency by such reasoning."

pImpls are the sort of thing you use at API boundaries to avoid leaking implementation details into users, but that's trading efficiency & complexity for a more stable API boundary. Scattering them throughout your code base would be like compiling with -O0. It's a silly, nonsensical waste of a user's constrained resources for a slight gain at compile time at a cost of code complexity.

Or, alternatively, using pImpls to optimize for compile time is a premature optimization. You should only optimize for compile time at most 3% of your source files, ever. The other 97% of your source files should be written for clarity & simplicity, which means no pImpls.


> Again, no, it isn't. You seem to be severely underestimating the systemic impact of cache misses if you are considering them a "small" impact to efficiency.

You are severely overestimating the impact of cache misses if you think that all indirection must be eliminated and any use of pImpls at all is always wrong, as you seem to be implying.

> pImpls are the sort of thing you use at API boundaries to avoid leaking implementation details into users, but that's trading efficiency & complexity for a more stable API boundary. Scattering them throughout your code base would be like compiling with -O0.

It's a good thing that I never advocated using them everywhere then. Where did you read me saying this?

> Or, alternatively, using pImpls to optimize for compile time is a premature optimization.

Only if this is done by default, which I have not advocated for anywhere in this thread. I called it a tool in the toolbox. I mentioned it as one of several possibilities. Somehow you have translated this into "use pImpls everywhere", which is a strawman.


you may not be aware but your comment came off as somewhat condescending, given that you don't really have any idea where parent poster is coming from or what their background is


If someone says that premature optimization isn't a thing, I don't think it is condescending to point out that it is by posting original source material. :-)


> it only wastes the programmer’s time. ... Wasting CPU time on the other hand bothers me a lot!

As always the answer is... it depends. Programmer time costs money, CPU time is cheap by comparison.

If you're building something that runs occasionally, or is IO/UI/network bound... CPU time is largely irrelevant. But if you're building something that runs in a tight loop or a library that will be compiled in millions of lines of code, then the wasted programmer time will absolutely be worth the ROI.


Yes, but, again, this article is about standard headers. Since it is impossible for STL library authors to decide that they are or are not writing for a performance-sensitive audience, it behooves them to provide completely visible header-only implementations of everything.


The pImpl pattern costs programmer time as it's more code to write. By contrast compiling is just CPU time and you can trivially throw a bigger workstation at the problem.


This depends on who is writing the code and who is compiling the headers. A software developer who is building headers for someone else (an internal or external client) may trade the overhead of this pattern for a faster compilation time. Reducing compilation time by 80% may be well worth the overhead of adding 10% more code to an interface.

It is not always possible to just throw more hardware at the problem of compilation. For instance, one may be using a build pipeline that requires specific steps to be followed as part of gating tasks. The time it takes to compile code over and over again for unit testing, behavioral testing, acceptance testing, integration testing, etc., each impacts delivery time and handoff.

Earlier in my career, I worked with a code base that was approximately 10 million lines of code in size. Compiling this code base would take approximately 7 hours on the best hardware we could buy. The C++ developers were adamant about ensuring that their headers were "complete" as they called it. With a few changes, such as forward declarations, abstract interfaces, and encapsulation, my team was able to reduce that compile time to less than 35 minutes. Based on profile feedback, we saw less than a quarter of a percent difference in overhead. Productivity-wise, we managed to reduce developer workflows so significantly that it was a better use of our time for overall cost than working on a billable project.

Most projects aren't nearly that bad, but it does go to show that it is possible to significantly reduce compilation time without significantly impacting runtime performance, even in C++.


> It is not always possible to just throw more hardware at the problem of compilation. For instance, one may be using a build pipeline that requires specific steps to be followed as part of gating tasks. The time it takes to compile code over and over again for unit testing, behavioral testing, acceptance testing, integration testing, etc., each impacts delivery time and handoff.

All of that is solved by throwing more hardware at it.

Alternatively if compile time is not the slow part of that pipeline, then you're prematurely optimizing the wrong thing anyway.

> Earlier in my career, I worked with a code base that was approximately 10 million lines of code in size. Compiling this code base would take approximately 7 hours on the best hardware we could buy. The C++ developers were adamant about ensuring that their headers were "complete" as they called it. With a few changes, such as forward declarations, abstract interfaces, and encapsulation, my team was able to reduce that compile time to less than 35 minutes.

In other words you only optimized the critical 3% of the codebase rather than prematurely optimizing everything with pImpl abstractions?


> Alternatively if compile time is not the slow part of that pipeline, then you're prematurely optimizing the wrong thing anyway.

Developer productivity does not matter in your world?

> In other words you only optimized the critical 3% of the codebase rather than prematurely optimizing everything with pImpl abstractions?

Yes, because nowhere in this thread have I advocated prematurely optimizing everything with pImpl abstractions. Those are words you have put in my mouth. pImpl abstraction is a single tool that can be used to improve compile-time performance. Not all the time, but in a fraction of the 3% of the time where it is appropriate.


Well, to the extent that it is a thing it only wastes the programmer’s time. I happen to think programmers are spending too little time, so that doesn’t bother me. Wasting CPU time on the other hand bothers me a lot!


Software still needs to be architected for performance from the start. Trying to micro optimize a loop before you know you need it what Knuth was saying to avoid.


Right. Much like avoiding pImpl because it might make a function call that occurs 0.001% of the time faster. That is the basis of the thread I was replying to.

Understanding what you are optimizing FOR and where the most attention should be spent is the crux of Knuth's argument. Trying to be clever up-front is often counter-productive.

There is nothing wrong with making some architectural decisions up front, but that is much different than avoiding pImpls at all costs because indirection is slower. Indirection doesn't always matter, and it should only be tackled when and where it does.


Well pImpl itself could be a premature optimization.


It's not an optimisation. It's to encapsulate logic which does not need to be present in public headers.

Given that it makes every object instantiation perform a memory allocation, followed by required indirection to accesss it, and will also prevent the creation of default copy constructor and assignment operator etc. due to use of unique_ptr, it adds complexity as well as two sources of performance loss.

As a result, I would use this pattern only where strictly necessary. For example, I've used it when wrapping C libraries with C++ classes. It means the C headers aren't included in the public C++ headers, only the implementation, making the implementation cleaner. Or I might use it in situations where private members drag in dozens of headers which expose a lot of unnecessary implementation details to the user, which might drag in transitive library dependencies simply by #including the headers. The fact that the compilation speed might be marginally faster is incidental to this.


The "pimpl idiom"[0] is about insulation, not optimization. What it affords is ensuring collaborators have no knowledge of a type's implementation details (data as well as private methods), which also has the byproduct of allowing for faster compilation times.

HTH

0 - https://cpppatterns.com/patterns/pimpl.html


AKA Bridge Pattern in GoF-speak

https://en.wikipedia.org/wiki/Bridge_pattern


Sure, and I would never recommend that indirection be used all the time as that would be premature.

However, if compilation times have gotten painful enough that we need to examine performance improvements to our headers, the pImpl pattern is one of many tools in the toolbox. So are forward headers and other compiler firewall techniques.


Knuth wasn't saying "just ignore performance altogether", he was saying "stop making things needlessly complicated for the last bit of juice".


For instance, pulling in the STL for an interface header instead of encapsulating these details. :-)

No one is claiming that we should ignore performance all together. But, understanding through profiling where performance issues are and designing toward a faster implementation is more important than trying to inline definitions up front.


Do you also write all your functions to pass large inputs by value until a profiler says you can pass a const reference?

Some implementation details are so well understood that you really don't need a profiler to do what is probably the right thing by default.


> Do you also write all your functions to pass large inputs by value until a profiler says you can pass a const reference?

No, but neither do I pass types that fit in a native integer by const reference because copies should be avoided at all costs. There is always a tradeoff.

> Some implementation details are so well understood that you really don't need a profiler to do what is probably the right thing by default.

Avoiding any and all indirection at all costs is not one of these.

The pImpl pattern, much like virtual methods, function pointers, etc., are each tools. Indirection is a trade-off that is either worth the expense in cache misses or is not. The cost is not so cut and dried as others in the thread have assumed.


Avoiding any and all indirection at all costs is not one of these.

"Any and all"? Perhaps not. However, I believe the context here was standard library headers. Those are full of small, often-used functions, so avoiding idioms based on indirection such as pImpl is about as close to a black and white rule as you're ever going to find in the programming world.


Nowhere did I say that pImpls should only be used. That was only one of several strategies I discussed.

The article may have discussed standard headers, but it was neither titled to indicate that it was talking about only standard headers, nor are the problems it discussed localized only to standard headers. My original comment did not limit the discussion to only standard headers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: