Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Perform a lookup assuming the value is between 0 and 16 (undefined behavior for out of range values)

I think you are misinterpreting the way that "undefined" is being used here. It's not a claim that one will get unpredictable results for this particular implementation, rather it's about the specification of the function. It's telling the user that the behavior of this function for out of range values is not guaranteed to remain the same across time as the code is changed, or across different architectures.

> I’m not sure about this part: ... popcnt instruction is very fast

I haven't worked on this particular code, but I've coauthored a paper with Daniel on beating popcnt using AVX2 instructions: https://lemire.me/en/publication/arxiv1611.07612/. While you are right that at times saving L1 space is a greater priority, I'd bet that the approach used here was tested and found to be faster on Haswell. I'm not sure if you noticed that the page you linked is Haswell specific?



> rather it's about the specification of the function

That “function” compiles into a single CPU instruction. The OP is perfectly aware of that, that’s why really_inline is there.

> on beating popcnt using AVX2 instructions

It’s easy to do with pshufb when you have many values on input. I have wrote about it years before that article, see there: https://github.com/Const-me/LookupTables#test-results

> I'd bet that the approach used here was tested and found to be faster on Haswell

I'd bet it’s an error.

> if you noticed that the page you linked is Haswell specific

I did. Was disappointed though, I expected to find something newer than Haswell from 2013, like Zen 2 or Skylake. When doing micro-optimizations like that, the exact micro-architecture matters.


> I expected to find something newer than Haswell from 2013, like Zen 2 or Skylake.

I'm sure optimizations for more recent architectures would be appreciated, and Daniel is wonderfully accepting of patches. Be careful though, or you might inadvertently end up as the maintainer of the whole project!


When I have free time, I’m generally more willing to contribute to my own open source projects no one cares about. Like this one: https://github.com/Const-me/Vrmac BTW did substantial amount of SIMD stuff there, for both 3D and 2D parts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: