[Python-Dev] The Trick (was Re: copysort patch,
was Re: inline sort option)
Alex Martelli
aleaxit at yahoo.com
Sat Oct 18 08:26:39 EDT 2003
Wondering about the trick of copysort not copying a singly-referenced list I
decided to try it out in a tiny extension module, and, yes, it is just as
trivial as one might wish (haven't dealt with optional args to sort, just
wanting to check performance &c):
static PyObject*
copysort(PyObject* self, PyObject* args)
{
PyObject *sequence, *listresult;
if(!PyArg_ParseTuple(args, "O", &sequence))
return 0;
if(PyList_CheckExact(sequence) && sequence->ob_refcnt==1) {
listresult = sequence;
Py_INCREF(listresult);
} else {
listresult = PySequence_List(sequence);
}
if(listresult) {
if(PyList_Sort(listresult) == -1) {
Py_DECREF(listresult);
listresult = 0;
}
}
return listresult;
}
and performance on an equally trivial testcase:
x = dict.fromkeys(range(99999))
def looponsorted1(x):
keys = x.keys()
keys.sort()
for k in keys: pass
def looponsorted2(x, c=copysort.copysort):
for k in c(x.keys()): pass
turns out to be identical between the two _with_ The Trick (4.4e+04 usec with
timeit.py -c on my box) while without The Trick copysort would slow down to
about 5.5e+04 usec.
But, this reminds me -- function filter, in bltinmodule.c, uses just about the
same trick too (to modify in-place when possible rather than making a new
list -- even though when it does make a new list it's an empty one, not a
copy, so the gain is less). There must be other cases of applicability which
just haven't been considered. So...
Shouldn't The Trick be embodied in PySequence_List itself...? So, the whole
small tricky part above:
if(PyList_CheckExact(sequence) && sequence->ob_refcnt==1) {
listresult = sequence;
Py_INCREF(listresult);
} else {
listresult = PySequence_List(sequence);
}
would collapse to a single PySequence_List call -- *AND* potential calls from
Python code such as "x=list(somedict.keys())" might also be speeded up
analogously...
[Such a call looks silly when shown like this, but in some cases one might not
know, in polymorphic use, whether a method returns a new or potentially
shared list, or other sequence, and a call to list() on the method's result
then may be needed to ensure the right semantics in all cases].
Is there any hidden trap in The Trick that makes it unadvisable to insert it
in PySequence_List? Can't think of any, but I'm sure y'all will let me know
ASAP what if anything I have overlooked...;-).
One might even be tempted to reach down all the way to PyList_GetSlice,
placing THERE The Trick in cases of all-list slicing of a singly-referenced
list (PyList_GetSlice is what PySequence_List uses, so it would also get the
benefit), but that might be overdoing it -- and encouraging list(xxx) instead
of xxx[:], by making the former a bit faster in some cases, would be no bad
thing IMHO (already I'm teaching newbies to prefer using list(...) rather
than ...[:] strictly for legibility and clarity, being able to mention
possible future performance benefits might well reinforce the habit...;-).
Alex
More information about the Python-Dev
mailing list