Re. the last point, was trying to think of computations where 1) an efficient in...

albertzeyer · on Dec 16, 2020

In Big-O notation, there will not be any difference, because copying the data will just be O(N), and whatever you do in the op will be at least O(N), so no change.

But in absolute terms, it could make a difference. Think of y = x + 1 vs y = x; y += 1. I would expect that the former is slightly faster. But actually I'm not really sure.

Actually, I implemented most of my native ops exactly in this way, i.e. I implemented the inplace version, and the non-inplace version would just additionally copy it and then call the inplace version.