[pypy-dev] Optimize constant float division by multiplication?
Toni Mattis
toni.mattis at student.hpi.uni-potsdam.de
Wed Nov 5 19:07:54 CET 2014
Hello,
I discovered that PyPy's JIT generates "DIVSD" instructions on xmm
registers when dividing a float by a constant C. This consumes an order
of magnitude more CPU cycles than the corresponding "MULSD" instruction
with a precomputed 1/C.
I know that only powers of two have an exact reciprocal floating point
representation, but there might be a benefit in trading the least
significant digit for a more significant speedup.
So, is this a missed optimization (at least for reasonably accurate
cases), a present or possibly future option (like -ffast-math in gcc) or
are there more reasons against it?
Thanks,
Toni
--- PS: Small Example ---
This function takes on average 0.41 seconds to compute on an
array.array('d') with 10**8 elements between 0 and 1:
def spikes_div(data, threshold=1.99):
count = 0
for i in data:
if i / 0.5 > threshold:
count += 1
return count
Rewritten with a multiplication it takes about 0.29 seconds on average,
speeding it up by factor 1.4:
...
if i * 2.0 > threshold:
...
The traces contain the same instructions (except for the MULSD/DIVSD)
and run the same number of times. I'm working with a fresh translation
of the current PyPy default on Ubuntu 14.04 x64 with a 2nd generation
Core i7 CPU.
More information about the pypy-dev
mailing list