
The Python 2.3 release schedule as laid out in PEP 283 is unrealistic. It has May 22 (yesterday) as the release date for 2.3a1, and then promises successive releases at approximately 4-week intervals. I think we're nowhere near doing the first alpha release. PythonLabs has been consumed by high priority Zope projects. There's a little less pressure now, and we can go back to thinking about Python releases, but we've lost a lot of time. We haven't even appointed a release manager! (Or the appointment was never recorded in the PEP.) In the discussion about release stability, it was brought up that the release schedule should be driven by feature completeness, not by the calendar. I basically agree, although I believe that when it takes the responsible party for a particular feature forever to implement it, it's better to move the feature to future release than to hold up the release forever. So let's discuss features for 2.3. PEP 283 mentions these:
(These are basically three friendly competing proposals.) Jeremy has made a little progress with a new compiler, but it's going slow and the compiler is only the first step. Maybe we'll be able to refactor the compiler in this release. I'm tempted to say we won't hold our breath for this one.
PEP 269 Pgen Module for Python Riehl
I haven't heard from Jon Riehl, so I consider dropping this idea.
PEP 273 Import Modules from Zip Archives Ahlstrom
I think this is close -- maybe it's already checked in and I don't know about it!
PEP 282 A Logging System Mick
Vinay Sajip has been making steady progress on an implementation, and despite a recent near-flamewar (which I haven't followed) I expect that his code will be incorporated in the near future. That's it as far as what PEP 283 mentions (there's also a bunch of things already implemented, like bool, PyMalloc, universal newlines). Here are some ideas of my own for things I expect to see in 2.3a1: - Provide alternatives for common uses of the types module; essentially Skip's proto-PEP. - Extended slice notation for all built-in sequences. Wasn't Raymond Hettinger working on this? - Fix the buffer object??? http://mail.python.org/pipermail/python-dev/2002-May/023896.html - Lazily tracking tuples? Neil, how's that coming? http://mail.python.org/pipermail/python-dev/2002-May/023926.html - Timeoutsocket. Work in progress. http://mail.python.org/pipermail/python-dev/2002-May/024077.html - Making None a keyword. Can't be done right away, but a warning would be a first step. http://mail.python.org/pipermail/python-dev/2002-April/023600.html - Stage 2 of the int/long integration (PEP 237). This mostly means warning about situations where hex, oct or shift of an int returns a different value than for the same value as a long. (I think the PEP misses this step, but it's necessary -- we can't just change the semantics silently without warning first.) - I think Andrew Kuchling has a bunch of distutils features planned; how's that coming? - PEP 286??? (MvL: Enhanced argument tuples.) I haven't had the time to review this thoroughly. It seems a deep optimization hack (also makes better correctness guarantees though). - A standard datetime type. An implementation effort is under way: http://www.zope.org/Members/fdrake/DateTimeWiki/FrontPage Effbot and MAL have a proposal for a basic interface that all datetime types should implement, but there are some problems with UTC. A decision needs to be made. Anything else? --Guido van Rossum (home page: http://www.python.org/~guido/)

On Fri, 24 May 2002, Guido van Rossum wrote:
The last thing i remember from this discussion was spending a lot of time trying to understand everyone's implementation schemes and drawing little pictures (http://web.lfw.org/python/globals.html) to describe them. I never heard anything back about them, and the discussion on this topic went suddenly silent after that. Was there a correlation (i hope not) or did everyone just get busy with other things? I was hoping that the figures would help us compare and evaluate the various schemes. -- ?!ng

[Ka-Ping Yee, on various vrbl-access speedup schemes]
Do you read Python-Dev? Lots of people commented appreciatively on them.
The Python conference happened then, and that took everyone (including you) away for a week. Perhaps during the conference people realized that implementing one of these things would actually require work.
I was hoping that the figures would help us compare and evaluate the various schemes.
I'm not sure the figures can measure runtime <wink>, but they certainly helped people *understand* the proposals.

Guido van Rossum wrote:
- Lazily tracking tuples? Neil, how's that coming? http://mail.python.org/pipermail/python-dev/2002-May/023926.html

On Fri, May 24, 2002 at 06:17:37PM -0400, Guido van Rossum wrote:
PEP 269 Pgen Module for Python Riehl
I haven't heard from Jon Riehl, so I consider dropping this idea.
I've also done some similar work. I ran into problems when trying to deallocate parser objects, and my interface is nothing like the one in the PEP. If there's interest, I might look at this again in the next few weeks. Jeff

I don't know if there's interest. You might ask on c.l.py. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Fri, May 24, 2002 at 06:17:37PM -0400, Guido van Rossum wrote:
<PLUG> My fastnames patch achieves much of what these PEPs are aiming for with minimal changes to the interpreter: http://tothink.com/python/fastnames </PLUG> Oren

[Oren Tirosh]
My fastnames patch achieves much of what these PEPs are aiming for with minimal changes to the interpreter: http://tothink.com/python/fastnames
Mm, interesting. I must've missed this post in the busy days after the conference. :-( Two observations: - Your benchmarks use an almost empty dict (for locals and globals at least). I'd like to see how they perform with a realistic number of other names in the dict - I'm worried that negative dict entries could fill the dict beyond its capacity. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Sat, May 25, 2002 at 08:17:47AM -0400, Guido van Rossum wrote:
True, the fastest code path (the macro) only works as long as the entry is in the first hash position. For the tests in my posting this is 100% of the cases. For real code the results are around 75%. To enable the display of dictionary statistics on exit compile with -dSHOW_DICTIONARY_STATS. One problem is that 75% is still not good enough - the fallback code path is significantly slower (although still faster than PyDict_GetItem). Another problem is that if you get a hash collision for a global symbol used inside a tight loop the hit rate can drop closer to 0%. One trick that may help is to shuffle the hash entries - for every 100th time the macro fails the entry will be moved up to the first hash position and the entry which previously occupied that position will be moved to the first empty hash position for its own hash chain. Statistically, this will ensure that the most commonly referenced names will tend to stay at the first hash position. I think it may improve the hit rate from 75% to 85% or higher and eliminate the worst-case scenario.
- I'm worried that negative dict entries could fill the dict beyond its capacity.
For each dictionary, the number of negative entries is bound by the number of global and builtin names in the co_names of all code objects that get executed with that dictionary as their namespace. For general-purpose dictionaries, of course, there is no such upper bound and therefore this optimization cannot be used in PyDict_GetItem. Oren

Piling more complexity on is likely to slow down the common case though.
And that's my worry. Suppose a program has only one global but touches every builtin. Will the dictionary properly get resized to accommodate all those negative entries? --Guido van Rossum (home page: http://www.python.org/~guido/)

On Sat, May 25, 2002 at 10:11:26AM -0400, Guido van Rossum wrote:
Not at all. The common case (the macro) will not have a single instruction added. The rare case (fallback function) will be slowed down by two instructions: decrement a global variable, jump if zero. The reshuffle will be triggered every 100th occurence of the rare case (which may be every 10000th occurence of the common case). It's the same approach used when designing a memory cache for a CPU: it's OK if a cache miss is made slower by loading an entire cache row as long as it improves the cache hit rate enough to justify it.
len(dir(__builtins__))*24 # sizeof PyDictEntry on i386 2736
Caching of string hashes also costs memory. Speed optimizations often do. These costs need to be weighed against complexity, performance, and even the code size - other alternatives may use up as much code space as this one uses data in this worst-case scenario. Oren

OK, OK. I timed pystone, and it's about 1.5% faster. If that's the best we can do (and I think that any of the PEPs trying to optimize global and builtin access probably have about the same improvement rate as your code), I'm not sure we should worry about this. I'd rather try to make method calls and instance attribute (both methods and instance variables) access faster...
I wasn't worried about size; I was worried about correctness. But it appears that your code is fine. BTW, here's a benchmarking tip. Instead of timing this: for i in xrange(10000000): hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; you should time the for loop in this code: x = [0]*10000000 for i in x: hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; The xrange iterator allocates (and implicitly deallocates) an int per iteration, and you don't use that int at all. (10 million pointers is 40 MB -- if that's too much, it's OK to test with 1 million iterations. Quick outcome: CVS python 2.3: builtin: 8.480 global: 6.780 local: 5.080 fast: 2.890 With your patch: builtin: 4.660 global: 4.160 local: 4.020 fast: 3.000 Hm, this reports: 0 inline dictionary lookups 1568 fast dictionary lookups 121 slow dictionary lookups 0.00% inline 7.16% slow created 308 string dicts converted 32 to normal dicts 10.39% conversion rate Somehow the counts seem to be off by several orders of magnitude. What *do* these numbers report? Also note that your patch slows down fast access (by 3%)! How can it? Adding more code to the interpreter's inner loop changes the cache behavior, etc. Tim Peters can tell you more about this. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Sat, May 25, 2002 at 12:41:23PM -0400, Guido van Rossum wrote:
pystone and other benchmarks spend their time in loops using fastlocals. I used to have a version where SHOW_DICTIONARY_STATS also counted fastlocals accesses for reference - they were 97% for pybench. You're right that we need to find if this is really a problem for real-life code. Are there any measurements for the speedup achieved in Zope by the argument/local tricks? With a little more work this dictionary optimization should be applicable to attributes, too.
I wasn't worried about size; I was worried about correctness. But it appears that your code is fine.
It isn't :-) I haven't implemented resizing of the hash table yet. If there is not enough room the negative entry isn't inserted. This bug is documented in my original posting.
Hm, this reports: 0 inline dictionary lookups
It's a rebuild dependency problem. Make sure that ceval.c also gets recompiled after setting SHOW_DICTIONARY_STATS. When collecting stats the inline is not really inline. Sorry about that.
In my measurements there was no such slowdown. Is this consistent? Your cache behavior theory reminds me of an algorithm that was sped up by almost 10% when a certain array type was changed from 64 to 67 entries. For modern CPUs the common practice of using powers of two can cause cache thrashing. Oren

[Guido, to Oren Tirosh]
Not unless someone pays me to <wink>. Here's a cute one: I changed the "break" at the end of LOAD_GLOBAL to "continue". It didn't change the speed of global-lookup tests at all, but did give a small boost to fastlocal lookups. On several occasions we've seen evidence that ceval is supremely sensitive to I-cache accidents, although Marc-Andre is a lot better at provoking them than I am <wink -- but IIRC he once got a 15% slowdown by adding an unreachable(!) printf to ceval>. give-me-two-weeks-of-uninterrupted-time-with-a-hw-simulator-and-you'll- get-a-real-answer-ly y'rs - tim

Oren, I want to make sure we're measuring something useful here. Your timing programs for "global" and "builtin" access all go through LOAD_NAME, which is rarely interesting -- except in "who cares?" (class bodies) and pathological (some kinds of exec/import* contexts) cases, global and builtin access go through LOAD_GLOBAL instead. On my box, that makes a huge difference. For example, accessing a global via LOAD_GLOBAL is measurably faster than accessing a local via LOAD_NAME here; LOAD_NAME isn't written for speed (it does a test+taken-branch first thing in every non-system-error case; LOAD_GLOBAL doesn't). Attached is a self-contained timing program that times all these ways, and addresses Guido's concerns about timing methodology. A typical run on my box displays: fastlocals 2.40 globals_via_load_global 3.27 (*) locals_via_load_name 3.46 globals_via_load_name 4.56 builtins_via_load_global 5.99 (*) builtins_via_load_name 7.12 (*) The code at <http://tothink.com/python/fastnames/posting.txt> doesn't time these common cases. Cross-run timing variance on a quiet machine here is +/-2 in the last digit (< 1%). Timings were done under a current CVS release build, not using Python's -O switch. from time import clock as now indices = [None] * 1000000 def display(name, time): print "%-25s %5.2f" % (name, time) def fastlocals(): hex = 5 for i in indices: hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; def locals_via_load_name(): class f: hex = 5 for i in indices: hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; heh = 5 def globals_via_load_name(): class f: for i in indices: heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; def globals_via_load_global(): for i in indices: heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; heh ; def builtins_via_load_name(): class f: for i in indices: hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; def builtins_via_load_global(): for i in indices: hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; for f in (fastlocals, globals_via_load_global, locals_via_load_name, globals_via_load_name, builtins_via_load_global, builtins_via_load_name): start = now() f() elapsed = now() - start display(f.__name__, elapsed)

I finally got around to applying Oren's patch. Nice work, BTW! Here are before-and-after timings for the test program I posted before. This is current CVS, release build, Win98SE, MSVC 6, 866MHz: seconds function before after % speedup -------- ------ ----- --------- fastlocals 2.40 2.39 0 globals_via_load_global 3.27 2.53 29 locals_via_load_name 3.46 2.66 30 globals_via_load_name 4.56 2.82 62 builtins_via_load_global 5.99 4.04 48 builtins_via_load_name 7.12 4.84 47 I only care about the XYZ_load_global times (speeding LOAD_NAME is like speeding multiplication by 1531 <wink>), and these numbers are nice.

On Mon, May 27, 2002 at 12:32:31AM -0400, Tim Peters wrote:
Here are before-and-after timings for your test program. Python 2.2.1, Linux, gcc 2.96, 866MHz: function before after % speedup -------- ------ ----- --------- fastlocals 2.63 2.58 2 (?!?) globals_via_load_global 4.04 3.43 18 builtins_via_load_global 7.87 5.64 40 Again, without INLINE_DICT_GETITEM_INTERNED: function before after % speedup -------- ------ ----- --------- fastlocals 2.63 2.59 2 globals_via_load_global 4.04 3.55 14 builtins_via_load_global 7.87 5.66 39 It looks like the inline macro doesn't really account for much the improvement. Oren

On Fri, May 24, 2002 at 06:17:37PM -0400, Guido van Rossum wrote:
- I think Andrew Kuchling has a bunch of distutils features planned; how's that coming?
That would be PEP 262, a database of installed Python packages. This May I've been mostly writing, not programming, but will try to get to it ASAP. --amk

- Extended slice notation for all built-in sequences. Wasn't Raymond Hettinger working on this?
Yes!
Anything else?
Perhaps an iterator tools module featuring goodies from SML and Haskell: import itertools ia = itertools.iter(open('readme.txt')) ia[4] # nth / index ia[0] # first ia[-1] # last ia[:10] # tail / drop ia[5:] # head / take ia[0,100,2] # slicing ia.enumerate(countfrom=0) ia.filter(pred) # takewhile ia.invfilter(pred) # dropwhile ia.map(func) ia.starmap(func) # [yield func(*args) for args in ia] ia.cycle() # repeats the seqn infinitely (requires aux mem) ia.unzip() # creates multiple iters from one ia.tabulate(func, countFrom=0) # sml: f[0], f[1], ... = ia.map(func, xrange(countFrom,sys.maxint)) itertools.repeat(obj) # while 1: yield obj itertools.zip(i1, i2, i3, ...) Raymond Hettinger

On Sun, May 26, 2002 at 01:19:15AM -0400, Raymond Hettinger wrote:
ia.filter(pred) # takewhile ia.invfilter(pred) # dropwhile
Err. I don't know what you mean with "filter", but in Haskell, there is a big difference between filter and takeWhile. Prelude> filter (>3) [1..5] [4,5] Prelude> takeWhile (>3) [1..5] [] Prelude> dropWhile (>3) [1..5] [1,2,3,4,5] /Martin -- Martin Sjögren martin@strakt.com ICQ : 41245059 Phone: +46 (0)31 7710870 Cell: +46 (0)739 169191 GPG key: http://www.strakt.com/~martin/gpg.html

[Guido]
It's unclear there whether you intend that a warning in 2.3 be coupled with (a) delivering the 2.2 result, or (b) delivering the new (auto-overflow) result. If a _future statement is introduced for this (incompatible changes are what __future__ is there for -- alas), #a is the obvious answer.
... Anything else?
Nuking SET_LINENO remains the conceptually clearest way to get a >5% boost in pystone. As other high-frequency paths in the interpreter have gotten leaner, the speed benefit of -O has gotten larger.

Guido,
Anything else?
Are you still pondering the inclusion of tar/bz2 support? Lars (tarfile's author) asked for some time to finish tarfile development and write a PEP about tarfile inclusion. Now that my vacation is over, I must get back to him and check what's going on. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]

Are you still pondering the inclusion of tar/bz2 support?
That's a library issue; I'm all for adding it if the code is robust enough, but ot for holding up a release for it.
Please do! --Guido van Rossum (home page: http://www.python.org/~guido/)

On Fri, 24 May 2002, Guido van Rossum wrote:
The last thing i remember from this discussion was spending a lot of time trying to understand everyone's implementation schemes and drawing little pictures (http://web.lfw.org/python/globals.html) to describe them. I never heard anything back about them, and the discussion on this topic went suddenly silent after that. Was there a correlation (i hope not) or did everyone just get busy with other things? I was hoping that the figures would help us compare and evaluate the various schemes. -- ?!ng

[Ka-Ping Yee, on various vrbl-access speedup schemes]
Do you read Python-Dev? Lots of people commented appreciatively on them.
The Python conference happened then, and that took everyone (including you) away for a week. Perhaps during the conference people realized that implementing one of these things would actually require work.
I was hoping that the figures would help us compare and evaluate the various schemes.
I'm not sure the figures can measure runtime <wink>, but they certainly helped people *understand* the proposals.

Guido van Rossum wrote:
- Lazily tracking tuples? Neil, how's that coming? http://mail.python.org/pipermail/python-dev/2002-May/023926.html

On Fri, May 24, 2002 at 06:17:37PM -0400, Guido van Rossum wrote:
PEP 269 Pgen Module for Python Riehl
I haven't heard from Jon Riehl, so I consider dropping this idea.
I've also done some similar work. I ran into problems when trying to deallocate parser objects, and my interface is nothing like the one in the PEP. If there's interest, I might look at this again in the next few weeks. Jeff

I don't know if there's interest. You might ask on c.l.py. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Fri, May 24, 2002 at 06:17:37PM -0400, Guido van Rossum wrote:
<PLUG> My fastnames patch achieves much of what these PEPs are aiming for with minimal changes to the interpreter: http://tothink.com/python/fastnames </PLUG> Oren

[Oren Tirosh]
My fastnames patch achieves much of what these PEPs are aiming for with minimal changes to the interpreter: http://tothink.com/python/fastnames
Mm, interesting. I must've missed this post in the busy days after the conference. :-( Two observations: - Your benchmarks use an almost empty dict (for locals and globals at least). I'd like to see how they perform with a realistic number of other names in the dict - I'm worried that negative dict entries could fill the dict beyond its capacity. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Sat, May 25, 2002 at 08:17:47AM -0400, Guido van Rossum wrote:
True, the fastest code path (the macro) only works as long as the entry is in the first hash position. For the tests in my posting this is 100% of the cases. For real code the results are around 75%. To enable the display of dictionary statistics on exit compile with -dSHOW_DICTIONARY_STATS. One problem is that 75% is still not good enough - the fallback code path is significantly slower (although still faster than PyDict_GetItem). Another problem is that if you get a hash collision for a global symbol used inside a tight loop the hit rate can drop closer to 0%. One trick that may help is to shuffle the hash entries - for every 100th time the macro fails the entry will be moved up to the first hash position and the entry which previously occupied that position will be moved to the first empty hash position for its own hash chain. Statistically, this will ensure that the most commonly referenced names will tend to stay at the first hash position. I think it may improve the hit rate from 75% to 85% or higher and eliminate the worst-case scenario.
- I'm worried that negative dict entries could fill the dict beyond its capacity.
For each dictionary, the number of negative entries is bound by the number of global and builtin names in the co_names of all code objects that get executed with that dictionary as their namespace. For general-purpose dictionaries, of course, there is no such upper bound and therefore this optimization cannot be used in PyDict_GetItem. Oren

Piling more complexity on is likely to slow down the common case though.
And that's my worry. Suppose a program has only one global but touches every builtin. Will the dictionary properly get resized to accommodate all those negative entries? --Guido van Rossum (home page: http://www.python.org/~guido/)

On Sat, May 25, 2002 at 10:11:26AM -0400, Guido van Rossum wrote:
Not at all. The common case (the macro) will not have a single instruction added. The rare case (fallback function) will be slowed down by two instructions: decrement a global variable, jump if zero. The reshuffle will be triggered every 100th occurence of the rare case (which may be every 10000th occurence of the common case). It's the same approach used when designing a memory cache for a CPU: it's OK if a cache miss is made slower by loading an entire cache row as long as it improves the cache hit rate enough to justify it.
len(dir(__builtins__))*24 # sizeof PyDictEntry on i386 2736
Caching of string hashes also costs memory. Speed optimizations often do. These costs need to be weighed against complexity, performance, and even the code size - other alternatives may use up as much code space as this one uses data in this worst-case scenario. Oren

OK, OK. I timed pystone, and it's about 1.5% faster. If that's the best we can do (and I think that any of the PEPs trying to optimize global and builtin access probably have about the same improvement rate as your code), I'm not sure we should worry about this. I'd rather try to make method calls and instance attribute (both methods and instance variables) access faster...
I wasn't worried about size; I was worried about correctness. But it appears that your code is fine. BTW, here's a benchmarking tip. Instead of timing this: for i in xrange(10000000): hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; you should time the for loop in this code: x = [0]*10000000 for i in x: hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; hex ; The xrange iterator allocates (and implicitly deallocates) an int per iteration, and you don't use that int at all. (10 million pointers is 40 MB -- if that's too much, it's OK to test with 1 million iterations. Quick outcome: CVS python 2.3: builtin: 8.480 global: 6.780 local: 5.080 fast: 2.890 With your patch: builtin: 4.660 global: 4.160 local: 4.020 fast: 3.000 Hm, this reports: 0 inline dictionary lookups 1568 fast dictionary lookups 121 slow dictionary lookups 0.00% inline 7.16% slow created 308 string dicts converted 32 to normal dicts 10.39% conversion rate Somehow the counts seem to be off by several orders of magnitude. What *do* these numbers report? Also note that your patch slows down fast access (by 3%)! How can it? Adding more code to the interpreter's inner loop changes the cache behavior, etc. Tim Peters can tell you more about this. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Sat, May 25, 2002 at 12:41:23PM -0400, Guido van Rossum wrote:
pystone and other benchmarks spend their time in loops using fastlocals. I used to have a version where SHOW_DICTIONARY_STATS also counted fastlocals accesses for reference - they were 97% for pybench. You're right that we need to find if this is really a problem for real-life code. Are there any measurements for the speedup achieved in Zope by the argument/local tricks? With a little more work this dictionary optimization should be applicable to attributes, too.
I wasn't worried about size; I was worried about correctness. But it appears that your code is fine.
It isn't :-) I haven't implemented resizing of the hash table yet. If there is not enough room the negative entry isn't inserted. This bug is documented in my original posting.
Hm, this reports: 0 inline dictionary lookups
It's a rebuild dependency problem. Make sure that ceval.c also gets recompiled after setting SHOW_DICTIONARY_STATS. When collecting stats the inline is not really inline. Sorry about that.
In my measurements there was no such slowdown. Is this consistent? Your cache behavior theory reminds me of an algorithm that was sped up by almost 10% when a certain array type was changed from 64 to 67 entries. For modern CPUs the common practice of using powers of two can cause cache thrashing. Oren
participants (10)
-
akuchlin@mems-exchange.org
-
Guido van Rossum
-
Gustavo Niemeyer
-
jepler@unpythonic.net
-
Ka-Ping Yee
-
Martin Sjögren
-
Neil Schemenauer
-
Oren Tirosh
-
Raymond Hettinger
-
Tim Peters