
Raymond writes:
I've already tried out a pure python proof-of-concept
Does that mean that you can give us some idea what kind of performance boost this actually resulted in? -- Michael Chermside

[Raymond]
I've already tried out a pure python proof-of-concept
[Michael Chermside]
Does that mean that you can give us some idea what kind of performance boost this actually resulted in?
It depends on what you're timing but it is not a big win. * Speed doubles in demo code that just makes references to globals but is much more modest when the builtins are called. This shows that the call time is more significant than the reference time: def f(i): dict; hasattr; float; pow; list; range # speed more than doubles hex(i); str(i); oct(i); int(i); float(i) # 12% gain * Contrived examples show the best gains while code from real apps show smaller improvements: def shuffle(random=random.random): # 6% gain x = list('abcdefghijklmnopqrstuvwyz0123456789') for i in xrange(len(x)-1, 0, -1): j = int(random() * (i+1)) x[i], x[j] = x[j], x[i] * PyStone does not use any builtins. * Scanning my own sources, it looks like some of the builtins almost never appear inside loops (dir, map, filter, zip, dict, range). The ones that are in loops usually do something simple (int, str, chr, len). Either way, builtin access never seems to dominate the running time. OTOH, maybe that's just the way I write code. Raymond Hettinger

Raymond Hettinger <python@rcn.com>:
* Scanning my own sources, it looks like some of the builtins almost never appear inside loops (dir, map, filter, zip, dict, range). The ones that are in loops usually do something simple (int, str, chr, len). Either way, builtin access never seems to dominate the running time. OTOH, maybe that's just the way I write code.
That's probably true in the large. However, sometimes one has a tight little loop that makes lots of calls to a builtin. I've occasionally improved the speed of something noticeably using the copy-a-builtin-to-a-local trick. Maybe for these cases there could be a "builtin" declaration, like "global" but declaring that something is to be found in the builtin scope? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Raymond Hettinger <python@rcn.com>:
* Scanning my own sources, it looks like some of the builtins almost never appear inside loops (dir, map, filter, zip, dict, range). The ones that are in loops usually do something simple (int, str, chr, len). Either way, builtin access never seems to dominate the running time. OTOH, maybe that's just the way I write code.
[Greg Ewing]
That's probably true in the large. However, sometimes one has a tight little loop that makes lots of calls to a builtin. I've occasionally improved the speed of something noticeably using the copy-a-builtin-to-a-local trick.
It will have to wait until Py2.4 and the issue will likely be subsumed by more sophisticated approaches that optimize all namespace access. Jeremy's DList technique looks especially promising. Similarly, I'm experimenting with a dict subclass that keeps its values in lists of length one and can return the container for clients interested in fast get or set access to the value associated with a given key. Also, I've been working on a faster design for dictionaries that increases overall sparseness (meaning fewer collisions) while increasing the density of entries that fit in a single cash line (reducing the cost of a miss). Increasing density involves splitting the arrays of PyDictEntry into separate arrays of hashes, keys, and values. Further, the entries are clustered into groups of up to 16 hash values that can fit in a single cache line. This also allows for a much tighter inner loop for the lookup function. Raymond Hettinger
participants (3)
-
Greg Ewing
-
Michael Chermside
-
Raymond Hettinger