[Patches] [ python-Patches-661440 ] Refactor and streamline PyCFunction_Call

Fri, 03 Jan 2003 16:19:16 -0800

Patches item #661440, was opened at 2003-01-03 04:06
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=661440&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Raymond Hettinger (rhettinger)
>Assigned to: Raymond Hettinger (rhettinger)
Summary: Refactor and streamline PyCFunction_Call

Initial Comment:
Refactor PyCFunction_Call with an eye towards 
clarity and speed while leaving the semantics 
unchanged.  

It is now obvious which combinations of flags are 
allowed; the new structure makes it easier to add new 
flags and flag combinations; and every path runs 
faster (by having fewer jumps, filling variables only 
when and where needed, and by merging a test into 
the switch logic).

* Incorporated the keyword flag test into switch/case.
* Deferred initialization of "size" until when and where 
needed.
* Inverted error tests so that the common case has 
no jumps.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2003-01-04 01:19

Message:
Logged In: YES 
user_id=21627

Since Jeremy is in favour, and I'm not strongly opposed,
please consider this patch accepted. I'm primary opposed
because of the principle "never change a running system", to
which I'm not married, either :-)

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2003-01-04 01:00

Message:
Logged In: YES 
user_id=80475

Are you willing to accept the patch on the basis of the 
semantic improvement?

Also, I tried the patch on a couple of my real world, 
published scripts (a matrix package and a puzzle solving 
graph traverser) and found that there was still a 
measurable though slight performance gain which isn't 
bad considering that they do a lot of work outside the 
affected code block (most optimizations affect only a 
single step and rarely give across the board significant 
speed boosts).

If you say no, that's cool.  Though I think the patch is the 
right thing to do, I'm not married to it and won't lose sleep 
if it gets rejected.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-01-03 21:12

Message:
Logged In: YES 
user_id=21627

I think any performance changes of this code are likely
irrelevant, for any real application. For example, it might
be that compilers can't efficiently dispatch the jump in the
new code, where they could have done so in the old code.

For gcc 3.2, the old code, for METH_O, uses four direct
jumps. The new code uses one computed and three direct
jumps, so it is slightly more expensive, in terms of jump
instructions; the function is also 40 bytes larger (313
bytes vs 357 bytes).

As I said, I cannot find one version particularly more clear
than the other - the new version could be improved by
putting the two keyword cases next to each other (something
that the old version did cleanly - you could easily tell
that keywords are processed first, and the special cases
then would not need to deal with keywords).

As for "leaving the semantics unchanged": there is a change
in semantics: METH_NOARGS|METH_KEYWORDS used to "work" (the
keyword arguments would be ignored), but gives an error
under the change. That change in semantics might be the only
reason to accept the patch, since it is now more correct.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2003-01-03 20:57

Message:
Logged In: YES 
user_id=31392

Sounds good to me.  The code looks clear enough.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2003-01-03 20:52

Message:
Logged In: YES 
user_id=80475

Okay, I improved the timing suite and found about a 5 to 
10% savings (suite attached as dict.diff and timemeth.py):

Case                NewTime        OldTime
--------                --------------         -------------
meth_o             10.60              11.53
meth_vargs      8.13                8.64
meth_noargs    8.08                8.90
meth_var&kw   8.07                8.46

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2003-01-03 19:47

Message:
Logged In: YES 
user_id=80475

Without making dummy C functions and C caller, I can't 
get a timing suite that isolates the improvement.  The 
timing code attached below shows no change.

>From code inspection, it's easy to see the operation count 
go down.  

For meth_o, meth_noargs and meth_oldargs, it saves the 
flags&keywords step and converts a usually-fail-jump if 
into a usually-succeed-nojump. 

For meth_vargs, it saves the size lookup, the 
flags&keywords step, saves the dict-empty test, and 
converts a usually-fail-jump if into a usually-succeed-
nojump.

For the meth_keywords case, it replaces the size lookup, 
the flags&keywords step, and a usually-succeed-no-jump if 
with a single switch/case jump.  Probably, a net wash 
here.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2003-01-03 18:06

Message:
Logged In: YES 
user_id=31392

Do you have any measurements of the speedup?

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2003-01-03 12:22

Message:
Logged In: YES 
user_id=80475

For me, the clarity comes from all allowable flag 
combinations being shown in the switch/case.

The duplication of the test for empty dictionaries and the 
computation of size are done for speed (only being done 
when and where needed).  The duplication is warranted 
only because the function is on the critical path for just 
about everything in Python and here speed really matters.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-01-03 11:40

Message:
Logged In: YES 
user_id=21627

I'm uncertain why this makes the function clearer; it
appears to make it merely different, as it duplicates the
test for empty dictionaries, and the computation of the size.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2003-01-03 11:19

Message:
Logged In: YES 
user_id=80475

Should've been assigned to Martin who wrote some of the 
existing code for that function.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=661440&group_id=5470