[Python-ideas] INSANE FLOAT PERFORMANCE!!!
Tim Peters
tim.peters at gmail.com
Tue Oct 11 23:06:19 EDT 2016
[Elliot Gorokhovsky]
> Warning: the contents of this message may be dangerous for readers with
> heart conditions.
It appears some people are offended by your exuberance. Ignore them ;-)
> ...
> And it turns out that if you do that, PyFloat_RichCompare becomes ONE LINE
> (after we set op=Py_LT)!!! Just
Right. And this is why floats will likely be the best case for your
approach: all the overheads of finding "the right" bottom-level
comparison are weighed against what's basically a single machine
instruction to do the actual work of comparing two floats.
> return (PyFloat_AS_DOUBLE(v) < PyFloat_AS_DOUBLE(w)) == 0 ? Py_False :
> Py_True;
In real life, this is better written as:
return (PyFloat_AS_DOUBLE(v) < PyFloat_AS_DOUBLE(w)) ? Py_True : Py_False;
That is, the comparison to 0 was an artificial complication left over
from simplifying the code.
However, that code is buggy, in a way that may not show up for a long
time. Keeping track of reference counts is crucial. It needs to be
more like:
PyObject *res:
res = (PyFloat_AS_DOUBLE(v) < PyFloat_AS_DOUBLE(w)) ? Py_True : Py_False;
Py_INCREF(res);
return res;
> So I thought, wow, this will give some nice numbers! But I underestimated
> the power of this optimization. You have no idea. It's crazy.
It's in the ballpark of what I expected :-)
> Since we're dealing with floats, I used Tim Peter's benchmark:
> Lib/test/sortperf.py, just modifying one function:
It's actually Guido's file, although I probably changed it much more
recently than he did.
As I explained in a different msg, it's _not_ a good benchmark. It's
good at distinguishing among "really really good", "not a huge
change", and "really really bad", but that's all. At the start, such
gross distinctions are all anyone should care about.
One thing I'd like to see: do exactly the same, but comment out your
float specialization. It's important to see whether sortperf.py says
"not a huge change" then (which I expect, but it needs confirmation).
That is, it's not enough if some cases get 4x faster if it's also that
other cases get much slower.
> ...
> #My function:
> def doit(L):
> F = FastList(L)
> f0 = time.perf_counter()
> F.fastsort()
> f1 = time.perf_counter()
> F = FastList(L)
> t0 = time.perf_counter()
> F.sort()
> t1 = time.perf_counter()
> print("%6.2f%%" % (100*(1-(f1-f0)/(t1-t0))), end=' ')
> flush()
Numbers in benchmarks always need to be explained, because they're
only clear to the person who wrote the code. From your code, you're
essentially computing
(old_time - new_time) / old_time * 100.0
but in an obfuscated way. So long as the new way is at least as fast,
the only possible values are between 0% and 100%. That's important to
note. Other people, e.g., measure these things by putting "new_time"
in the denominator. Or just compute a ratio, like old_time /
new_time.
They way you're doing it, e.g., "75%" means the new way took only a
quarter of the time of the old way - it means new_time is 75% smaller
than old_time. I'm comfortable with that, but it's not the _clearest_
way to display timing differences so dramatic; old_time / new_time
would be 4.0 in this case, which is easier to grasp at a glance.
> So the benchmarking here is valid.
No, it sucks ;-) But it's perfectly adequate for what I wanted to see
from it :-)
> I didn't write it. All I did was modify
> it to print percent improvement instead of sort time. The numbers below are
> percent improvement of my sort vs default sort (just clone my repo and run
> python sortperf.py to verify):
>
> i 2**i *sort \sort /sort 3sort +sort %sort ~sort =sort !sort
> 15 32768 44.11% 54.69% 47.41% 57.61% 50.17% 75.24% 68.03% 65.16%
> 82.16%
> 16 65536 14.14% 75.38% 63.44% 56.56% 67.99% 66.19% 50.72% 61.55%
> 61.87%
> 17 131072 73.54% 60.52% 60.97% 61.63% 52.55% 49.84% 68.68% 84.12%
> 65.90%
> 18 262144 54.19% 55.34% 54.67% 54.13% 55.62% 52.88% 69.30% 74.86%
> 72.66%
> 19 524288 55.12% 53.32% 53.77% 54.27% 52.97% 53.53% 67.55% 76.60%
> 78.56%
> 20 1048576 55.05% 51.09% 60.05% 50.69% 62.98% 50.20% 66.24% 71.47%
> 61.40%
If it _didn't_ suck, all the numbers in a column would be about the same :-)
A meta-note: when Guido first wrote sortperf.py, machines - and
Python! - were much slower. Now sorting 2**15 elements goes so fast
that this gross timing approach is especially only good for making the
grossest distinctions. On my box today, even the slowest case (*sort
on 2**20 elements) takes under half a second. That's "why" the
numbers in each column are much more stable across the last 3 rows
than across the first 3 rows - the cases in the first 3 rows take so
little time that any timing glitch can skew them badly.
Note that you can pass arguments to sortperf.py to check any range of
power-of-2 values you like, but if you're aware of the pitfalls the
output as-is is fine for me.
> This is just insane. This is crazy.
Yet nevertheless wholly expected ;-)
> ...
> This is so cool.
It is! Congratulations :-)
More information about the Python-ideas
mailing list