[Python-ideas] INSANE FLOAT PERFORMANCE!!!

Tue Oct 11 12:16:45 EDT 2016

Warning: the contents of this message may be dangerous for readers with
heart conditions.

So today I looked at PyFloat_RichCompare. I had been scared initially
because it was so complicated, I was worried I would miss some important
error check or something if I special cased it. So I took the following
approach: I replaced all statements of the form PyLong_Check(w) with 0, and
all statements of the form PyFloat_Check(w) with 1. Because that's the
whole point; we're cutting out type checks. I then cut out all blocks
preceded by if (0). So it's definitely safe.

And it turns out that if you do that, PyFloat_RichCompare becomes ONE LINE
(after we set op=Py_LT)!!! Just

return (PyFloat_AS_DOUBLE(v) < PyFloat_AS_DOUBLE(w)) == 0 ? Py_False :
Py_True;

So I thought, wow, this will give some nice numbers! But I underestimated
the power of this optimization. You have no idea. It's crazy.

Since we're dealing with floats, I used Tim Peter's benchmark:
Lib/test/sortperf.py, just modifying one function:

#Old function Tim Peters wrote:
def doit_fast(L):
    t0 = time.perf_counter()
    L.fastsort()
    t1 = time.perf_counter()
    print("%6.2f" % (t1-t0), end=' ')
    flush()

#My function:
def doit(L):
    F = FastList(L)
    f0 = time.perf_counter()
    F.fastsort()
    f1 = time.perf_counter()
    F = FastList(L)
    t0 = time.perf_counter()
    F.sort()
    t1 = time.perf_counter()
    print("%6.2f%%" % (100*(1-(f1-f0)/(t1-t0))), end=' ')
    flush()

So the benchmarking here is valid. I didn't write it. All I did was modify
it to print percent improvement instead of sort time. The numbers below are
percent improvement of my sort vs default sort (just clone my repo and run
python sortperf.py to verify):

 i    2**i  *sort  \sort  /sort  3sort  +sort  %sort  ~sort  =sort  !sort
15   32768  44.11%  54.69%  47.41%  57.61%  50.17%  75.24%  68.03%  65.16%
82.16%
16   65536  14.14%  75.38%  63.44%  56.56%  67.99%  66.19%  50.72%  61.55%
61.87%
17  131072  73.54%  60.52%  60.97%  61.63%  52.55%  49.84%  68.68%  84.12%
65.90%
18  262144  54.19%  55.34%  54.67%  54.13%  55.62%  52.88%  69.30%  74.86%
72.66%
19  524288  55.12%  53.32%  53.77%  54.27%  52.97%  53.53%  67.55%  76.60%
78.56%
20 1048576  55.05%  51.09%  60.05%  50.69%  62.98%  50.20%  66.24%  71.47%
61.40%

This is just insane. This is crazy. I didn't write this benchmark, OK, this
is a valid benchmark. Tim Peters wrote it. And except for one trial, they
all show more than 50% improvement. Some in the 70s and 80s. This is crazy.
This is so cool. I just wanted to share this with you guys. I'll submit a
patch to bugs.python.org soon; I just have to write a special case
comparison for tuples and then I'm basically done. This is so cool!!!!!!!!!
50% man!!!!! Crazy!!!!!

Elliot

On Mon, Oct 10, 2016 at 9:02 PM Tim Peters <tim.peters at gmail.com> wrote:

> [please restrict follow-ups to python-ideas]
>
> Let's not get hung up on meta-discussion here - I always thought "massive
> clusterf**k" was a precise technical term anyway ;-)
>
> While timing certainly needs to be done more carefully, it's obvious to me
> that this approach _should_ pay off significantly when it applies.
> Comparisons are extraordinarily expensive in Python, precisely because of
> the maze of test-and-branch code it requires just to figure out which
> bottom-level comparison function to invoke each time.  That's why I spent
> months of my life (overall) devising a sequence of sorting algorithms for
> Python that reduced the number of comparisons needed.
>
> Note that when Python's current sort was adopted in Java, they still kept
> a quicksort variant for "unboxed" builtin types.  The adaptive merge sort
> incurs many overheads that often cost more than they save unless
> comparisons are in fact very expensive compared to the cost of pointer
> copying (and in Java comparison of unboxed types is cheap).  Indeed, for
> native numeric types, where comparison is dirt cheap, quicksort generally
> runs faster than mergesort despite that the former does _more_ comparisons
> (because mergesort does so much more pointer-copying).
>
> I had considered something "like this" for Python 2, but didn't pursue it
> because comparison was defined between virtually any two types (34 < [1],
> etc), and people were careless about that (both by design and by
> accident).  In Python 3, comparison "blows up" for absurdly mixed types, so
> specializing for homogeneously-typed lists is a more promising idea on the
> face of it.
>
> The comparisons needed to determine _whether_ a list's objects have a
> common type is just len(list)-1 C-level pointer comparisons, and so goes
> fast.  So I expect that, when it applies, this would speed even sorting an
> already-ordered list with at least 2 elements.
>
> For a mixed-type list with at least 2 elements, it will always be pure
> loss.  But (a) I expect such lists are uncommon (and especially uncommon in
> Python 3); and (b) a one-time scan doing C-level pointer comparisons until
> finding a mismatched type is bound to be a relatively tiny cost compared to
> the expense of all the "rich comparisons" that follow.
>
> So +1 from me on pursuing this.
>
> Elliot, please:
>
> - Keep this on python-ideas.  python-dev is for current issues in Python
> development, not for speculating about changes.
>
> - Open an issue on the tracker:  https://bugs.python.org/
>
> - At least browse the info for developers:
> https://docs.python.org/devguide/
>
> - Don't overlook Lib/test/sortperf.py.  As is, it should be a good test of
> what your approach so far _doesn't_ help, since it sorts only lists of
> floats (& I don't think you're special-casing them).  If the timing results
> it reports aren't significantly hurt (and I expect they won't be), then add
> specialization for floats too and gloat about the speedup :-)
>
> - I expect tuples will also be worth specializing (complex sort keys are
> often implemented as tuples).
>
> Nice start! :-)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20161011/2168b23d/attachment-0001.html>