[Python-ideas] INSANE FLOAT PERFORMANCE!!!

Tue Oct 11 23:06:19 EDT 2016

[Elliot Gorokhovsky]
> Warning: the contents of this message may be dangerous for readers with
> heart conditions.

It appears some people are offended by your exuberance.  Ignore them ;-)

> ...
> And it turns out that if you do that, PyFloat_RichCompare becomes ONE LINE
> (after we set op=Py_LT)!!! Just

Right.  And this is why floats will likely be the best case for your
approach:  all the overheads of finding "the right" bottom-level
comparison are weighed against what's basically a single machine
instruction to do the actual work of comparing two floats.

> return (PyFloat_AS_DOUBLE(v) < PyFloat_AS_DOUBLE(w)) == 0 ? Py_False :
> Py_True;

In real life, this is better written as:

return (PyFloat_AS_DOUBLE(v) < PyFloat_AS_DOUBLE(w)) ? Py_True  : Py_False;

That is, the comparison to 0 was an artificial complication left over
from simplifying the code.

However, that code is buggy, in a way that may not show up for a long
time.  Keeping track of reference counts is crucial.  It needs to be
more like:

PyObject *res:
res = (PyFloat_AS_DOUBLE(v) < PyFloat_AS_DOUBLE(w)) ? Py_True  : Py_False;
Py_INCREF(res);
return res;

> So I thought, wow, this will give some nice numbers! But I underestimated
> the power of this optimization. You have no idea. It's crazy.

It's in the ballpark of what I expected :-)

> Since we're dealing with floats, I used Tim Peter's benchmark:
> Lib/test/sortperf.py, just modifying one function:

It's actually Guido's file, although I probably changed it much more
recently than he did.

As I explained in a different msg, it's _not_ a good benchmark.  It's
good at distinguishing among "really really good", "not a huge
change", and "really really bad", but that's all.  At the start, such
gross distinctions are all anyone should care about.

One thing I'd like to see:  do exactly the same, but comment out your
float specialization.  It's important to see whether sortperf.py says
"not a huge change" then (which I expect, but it needs confirmation).
That is, it's not enough if some cases get 4x faster if it's also that
other cases get much slower.

> ...
> #My function:
> def doit(L):
>     F = FastList(L)
>     f0 = time.perf_counter()
>     F.fastsort()
>     f1 = time.perf_counter()
>     F = FastList(L)
>     t0 = time.perf_counter()
>     F.sort()
>     t1 = time.perf_counter()
>     print("%6.2f%%" % (100*(1-(f1-f0)/(t1-t0))), end=' ')
>     flush()

Numbers in benchmarks always need to be explained, because they're
only clear to the person who wrote the code.  From your code, you're
essentially computing

    (old_time - new_time) / old_time * 100.0

but in an obfuscated way.  So long as the new way is at least as fast,
the only possible values are between 0% and 100%.  That's important to
note.  Other people, e.g., measure these things by putting "new_time"
in the denominator.  Or just compute a ratio, like old_time /
new_time.

They way you're doing it, e.g., "75%" means the new way took only a
quarter of the time of the old way - it means new_time is 75% smaller
than old_time.  I'm comfortable with that, but it's not the _clearest_
way to display timing differences so dramatic; old_time / new_time
would be 4.0 in this case, which is easier to grasp at a glance.

> So the benchmarking here is valid.

No, it sucks ;-)  But it's perfectly adequate for what I wanted to see
from it :-)

> I didn't write it. All I did was modify
> it to print percent improvement instead of sort time. The numbers below are
> percent improvement of my sort vs default sort (just clone my repo and run
> python sortperf.py to verify):
>
>  i    2**i  *sort  \sort  /sort  3sort  +sort  %sort  ~sort  =sort  !sort
> 15   32768  44.11%  54.69%  47.41%  57.61%  50.17%  75.24%  68.03%  65.16%
> 82.16%
> 16   65536  14.14%  75.38%  63.44%  56.56%  67.99%  66.19%  50.72%  61.55%
> 61.87%
> 17  131072  73.54%  60.52%  60.97%  61.63%  52.55%  49.84%  68.68%  84.12%
> 65.90%
> 18  262144  54.19%  55.34%  54.67%  54.13%  55.62%  52.88%  69.30%  74.86%
> 72.66%
> 19  524288  55.12%  53.32%  53.77%  54.27%  52.97%  53.53%  67.55%  76.60%
> 78.56%
> 20 1048576  55.05%  51.09%  60.05%  50.69%  62.98%  50.20%  66.24%  71.47%
> 61.40%

If it _didn't_ suck, all the numbers in a column would be about the same :-)

A meta-note:  when Guido first wrote sortperf.py, machines - and
Python! - were much slower.  Now sorting 2**15 elements goes so fast
that this gross timing approach is especially only good for making the
grossest distinctions.  On my box today, even the slowest case (*sort
on 2**20 elements) takes under half a second.  That's "why" the
numbers in each column are much more stable across the last 3 rows
than across the first 3 rows - the cases in the first 3 rows take so
little time that any timing glitch can skew them badly.

Note that you can pass arguments to sortperf.py to check any range of
power-of-2 values you like, but if you're aware of the pitfalls the
output as-is is fine for me.

> This is just insane. This is crazy.

Yet nevertheless wholly expected ;-)

> ...
> This is so cool.

It is!  Congratulations :-)