[Patches] [ python-Patches-590843 ] list sort perf boost

noreply@sourceforge.net noreply@sourceforge.net
Sun, 04 Aug 2002 14:43:07 -0700


Patches item #590843, was opened at 2002-08-04 15:22
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=590843&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
>Assigned to: Neal Norwitz (nnorwitz)
Summary: list sort perf boost

Initial Comment:
I don't really love this patch--the names suck.  There
are also 2 warnings.  The warnings can be removed by
casting, but I wanted to see if anybody had better ideas.

However, I can't let Tim have all the fun optimizing
the hell out of list sort.  This patch gets rid of
COMPARE == NULL in the ISLT macro.  A structure is
created which holds both the cmp function and
PyObject/Py_LT.

I get a speed up of between ~1-3%.  Using the largest
#s, 7.67 -> 7.59 for *sort.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-08-04 17:43

Message:
Logged In: YES 
user_id=31435

Heh -- cute!  It doesn't make a measurable difference on my 
MSVC6 box, but I can believe it helps a bit elsewhere.

However, the warnings are trying to tell us something:  
casting PyObject_RichCompareBool to a type that takes a 
void* 3rd argument instead of an int is likely to work on 
boxes where sizeof(void*) == sizeof(int) (like my box, and 
presumably yours too), but it's lying in a potentially 
dangerous way on other boxes.  If sizeof(void*) > sizeof
(int), it's still likely to work on big-endian machines, but it 
gets very iffy on little-endian boxes 
(PyObject_RichCompareBool can pick up just the least-
significant bits then).

On the third hand, Py_LT is currently a #define for 0, so it 
may well work (albeit by accident) even then.

I don't think there's enough gain here to justify that level of 
obscure x-platform pain, so sorry to say I'm inclined to 
reject this.

A different route you may want to pursue:  checking for 
failure after every function call is likely a significant expense 
(test, branch, test, branch, test, branch, ...), which 
judicious use of setjmp/longjmp could alleviate.  The tricky 
bit there is ensuring that failures in merge_lo and merge_hi 
arrange to get the temp array copied back in to the merge 
area (I expect they'd have to leave clues in the MergeState 
struct, which the longjmp target could act on).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=590843&group_id=5470