Just use C or C++ or fortran or julia or even lua.
The issue of "slow loops" is entirely self-inflicted by some languages. It's getting ridiculous to still worry about this shit in 2022.
Simple loops can and should be just as fast as vectorized programs. When they are slower, it is 100% due to deliberate decisions by the language dessigners
BLAS is great and must be used whenever possible. That was not my point.
I don't want to turn my back on the beautiful loop notation. For some algorithms, the loop notation is clearer than any "vectorized" version. It is absurd that the language penalizes you for that. Loops are alright.
Compilers can be pretty good if you help them out a bit. Here's my implementation of Einstein reductions (including summations) in C++, which generate pretty close to ideal code until you start getting into processor architecture specific optimizations: https://github.com/dsharlet/array#einstein-reductions
The issue of "slow loops" is entirely self-inflicted by some languages. It's getting ridiculous to still worry about this shit in 2022.
Simple loops can and should be just as fast as vectorized programs. When they are slower, it is 100% due to deliberate decisions by the language dessigners