[Patches] [ python-Patches-502415 ] optimize attribute lookups

noreply@sourceforge.net noreply@sourceforge.net
Sat, 23 Mar 2002 17:57:18 -0800


Patches item #502415, was opened at 2002-01-11 18:07
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Zooko O'Whielacronx (zooko)
Assigned to: Nobody/Anonymous (nobody)
Summary: optimize attribute lookups

Initial Comment:
This patch optimizes the string comparisons in
class_getattr(), class_setattr(), instance_getattr1(),
and instance_setattr().

I pulled out the relevant section of class_setattr()
and measured its performance, yielding the following
results:

 * in the case that the argument does *not* begin with
"__", then the new version is 1.03 times as fast as the
old.  (This is a mystery to me, as the path through the
code looks the same, in C.  I examined the assembly
that GCC v3.0.3 generated in -O3 mode, and it is true
that the assembly for the new version is
smaller/faster, although I don't really understand why.)

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and ends with "X_" (where X is a random alphabetic
character), then the new version 1.12 times as fast as
the old.

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and does *not* end with "_", then the new version
1.16 times as fast as the old.

 * in the case that the argument is (randomly) one of
the six special names, then the new version is 2.7
times as fast as the old.

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and ends with "__" (but is not one of the six
special names), then the new version is 3.7 times as
fast as the old.



----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 01:57

Message:
Logged In: YES 
user_id=35752

Based on the complexity added by the patch I would say
at least a 5% speedup would be needed to offset the
maintainence cost.  -1 on the current patch.

----------------------------------------------------------------------

Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-03-14 16:24

Message:
Logged In: YES 
user_id=52562

update:

I did a real app benchmark of this patch by running one of
the unit tests from 
PyXML-0.6.6.  (Which one?  The one that I guessed would
favor my optimization 
the most.  Unfortunately I've lost my notes and I don't
remember which one.)

I also separated out the "unroll strcmp" optimization from
the "use macros" 
optimization on request.

I have lost my notes, but I recall that my results showed
what I expected: 
between 0.5 and 3 percent app-level speed-up for the unroll
strcmp optimization.

Interesting detail: a quirk in GCC 3 makes the unroll strcmp
version is slightly 
faster than the current strcmp version *even* in the
(common) case that the 
first two characters of the attribute name are *not* '__'.

What should happen next:

1.  Someone who has the authority to approve or reject this
patch should tell me 
what kind of benchmark would be persuasive to you.  I mean:
what specific 
program I can run with and without my patch for a useful
comparison.  (If you 
require more than a 5% app-level speed-up, then let's give
up on this patch now!)

2.  Someone volunteer to test this patch with MSFT compiler,
as I don't have one 
right now.  Some people are still using the Windows
platform, I've noticed [1], 
so it is worth benchmarking.  Actually, someone should
volunteer to benchmark 
GCC+Linux-or-MacOSX, too, as my computer is a laptop with
variable-speed CPU and 
is really crummy for benchmarking.

By the way, PEP 266 is a better solution to the problem but
until it's 
implemented, this patch is the better patch.  ;-)

Note: this is one of those patches that looks uglier in
"diff -u" format than in 
actual source code.  Please browse the actual source
side-by-side [2] to see how 
ugly it really is.

Regards

Zooko

[1] http://www.google.com/press/zeitgeist/jan02-pie.gif
[2] search for "class_getattr" in:
    http://zooko.com/classobject.c
    http://zooko.com/classobject-strcmpunroll.c

---
                 zooko.com
Security and Distributed Systems Engineering
---


----------------------------------------------------------------------

Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-01-18 00:22

Message:
Logged In: YES 
user_id=52562

Okay I've done some "mini benchmarks".  The earlier reported
micro-benchmarks were the result of running the inner loop
itself, in C.  These mini benchmarks are the result of
running this Python script:

class A:
    def __init__(self):
        self.a = 0

a = A()
for i in xrange(2**20):
    a.a = i

print a.a

and then using different attribute names in place of `a'.
The results are as expected: the optimized version is faster
than the current one, depending on the shape of the
attribute name, and dampened by the fact that there is now
other work being done.  The case that shows the smallest
difference is when the attribute name neither begins nor
ends with an '_'.  In that case the above script runs about
2% faster with the optimizations.  The case that shows the
biggest difference is when the attribute begins and ends
with '__', as in `__a__'.  Then the above script runs about
15% faster.

This still isn't a *real* application benchmark.  I'm
looking for one that is a reasonable case for real Python
users but that also uses attribute lookups heavily.


----------------------------------------------------------------------

Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-01-17 20:33

Message:
Logged In: YES 
user_id=52562

Yeah, the optimized version is less readable that the original.

I'll try to come up with a benchmark application.  Any
ideas?  Maybe some unit tests from Zope that use attribute
lookups heavily?

My guess is that the actual results in an application will
be "marginal", like maybe between 0.5% to 3% improvement.



----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-01-17 18:29

Message:
Logged In: YES 
user_id=31392

This seems to add a lot of complexity for a few special
cases.  How important are these particular attributes?  Do
you have any benchmark applications that show real
improvement?  It seems like microbenchmarks overstate the
benefit, since we don't know how often these attributes are
looked up by most applications.

It would also be interesting to see how much of the benefit
for non __ names is the result of the PyString_AS_STRING()
macro.  Maybe that's all the change we really need :-).


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470