sorting many arrays from one...

Tim Peters tim.one at comcast.net
Mon Jul 15 01:34:20 EDT 2002


[Jonathan Hogg]
> Oh well, in that case you can trim another couple of % off the calls with:
>
> Index: listobject.c
> ===================================================================
> RCS file: /cvsroot/python/python/dist/src/Objects/listobject.c,v
> retrieving revision 2.116
> diff -r2.116 listobject.c
> 783c783
> <       res = PyEval_CallObject(compare, args);
> ---
> >       res = PyObject_Call(compare, args, NULL);
>
> Since PyEval_CallObject just does some unnecessary typechecks on the
> arguments.

Cool!  I checked it in.  On this box it gave a 2.4% speedup when sorting
lists of ints via list.sort(__builtin__.cmp), which is likely a best case
for it (a single type check costs as much as comparing two ints).

> Or being even more aggressive:
>
> Index: listobject.c
> ===================================================================
> RCS file: /cvsroot/python/python/dist/src/Objects/listobject.c,v
> retrieving revision 2.116
> diff -r2.116 listobject.c
> 783c783
> <       res = PyEval_CallObject(compare, args);
> ---
> >       res = (*compare->ob_type->tp_call)(compare, args, NULL);
> 1297a1298,1302
> >       }
> >       if (compare && compare->ob_type->tp_call == NULL) {
> >               PyErr_Format(PyExc_TypeError, "'%s' object is not
> callable",
> >                            compare->ob_type->tp_name);
> >               return NULL;
>
> i.e., pull the remaining typecheck out to 'listsort'. But I don't
> much like this one as it gains only a tiny amount over the one above,
> seems quite a bit more obtuse (or is there a neater callable check
> macro?), and also subtly changes the semantics (sorting an empty list
> with a non-callable fails - though perhaps it should).

I'm glad you're not going to fight to the death for it, because I'd fight to
the death to keep it out <wink>.

> ...
> Yeah, I've had a short wander through the innards of cmp. It's a bit of a
> convoluted one, but necessarily so. I don't see how the calls can be made
> more optimal and cmp has to be the way it is, so I don't think
> '.sort(cmp)' can be made any faster without checking to see if 'compare'
> *is* 'cmp' and then not bothering to make the calls ;-)

There's a ~20% speedup remaining to be had, via abusing the immutability of
tuples and *reusing* the args tuple across docompare() calls, skipping the
PyTuple_New(2) and later decref.  Allocating and deallocating the args tuple
each time through is a real expense.  This is tricky to get right, though,
because docompare() needs to be reentrant, and allocating a dedicated tuple
per call cuts off a world of subtle problems.  I'm not going to pursue this
one.






More information about the Python-list mailing list