sorting many arrays from one...

Wed Jul 10 08:28:37 EDT 2002

On 10/7/2002 11:38, in article l8UW8.47032$n4.11332104 at newsc.telia.net,
"Fredrik Lundh" <fredrik at pythonware.com> wrote:

> Jerzy Karczmarczuk wrote:
> 
>> Shall I repeat it once more? I think that the call overhead should be
>> perhaps better optimised, and if it seems difficult, I would like to
>> know why.
> 
> you have the full source code, like everyone else.  I suggest
> starting in the docompare() function in Objects/listobject.c...
> 
> if you find some small change that would cause a major speedup,
> try it out, and post the patch if it seems to work.  if you suggest
> that we throw away what we have and start over from scratch,
> please bring lots of food and money to the party...

Out of curiosity I took a look at the code and I could only think of one
optimisation:

Index: listobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/listobject.c,v
retrieving revision 2.114
diff -r2.114 listobject.c
775c775
<       args = Py_BuildValue("(OO)", x, y);
---
>       args = PyTuple_New(2);
777a778,781
>       PyTuple_SetItem(args, 0, x);
>       Py_INCREF(x);
>       PyTuple_SetItem(args, 1, y);
>       Py_INCREF(y);

This takes the difference between '.sort()' and '.sort(cmp)' for a list of
integers down from 6.5x slower to 4.5x slower on my machine (300,000 element
list). But the improvement is fairly marginal in my book since, unless your
comparison function is a C function, invoking the bytecode interpreter makes
it at least 10x slower (12x slower without the above change).

Beyond that I couldn't see much in the way of obvious changes that could be
made. It just fundamentally takes a non-zero amount of time to make a
PyObject_Call and if you make 5 million of them in rapid succession then it
adds up.

Jonathan