Hi all,
I hope this is the right place to ask my questions.
Recently I did some simple benchmarks on the numpy.conjugate routine for an 128 MB array, lets call it 'A'.
It turned out that on my test machines there is a speed up of around 2.5 times if instead of simply calling
A.conj()
one loops over sub-matrices that fit into L1 cache of my CPU, like:
for i in range(0,A.shape[0],size):
A[i:i+size].conj()
such that each A[i:i+size] fits into L1 cache.
I posted example code and some graphs over at stackoverflow (https://stackoverflow.com/questions/73209565/strange-behaviour-during-multi…)
I quickly checked and found a similar behavior for numpy.square.
Now for my questions.
1. Is this a known/expected behavior of NumPy ?
2. Would it be possible/sensible to make simple numerical operations like numpy.conjugate & numpy.square cache aware ?
Best,
Tim