PEP 267 improvement idea (fwd)

Oren Tirosh oren-py-l at hishome.net
Tue Jan 14 12:14:33 EST 2003


On Tue, Jan 14, 2003 at 09:54:48AM -0600, Skip Montanaro wrote:
> 
>     Oren> I know Guido likes the idea but the implementation is still far
>     Oren> from complete.
> 
> Can you elaborate on its incompleteness?

Ummmm... I've experimented with several different strategies and tricks.
I'm not so sure which ones are actually implemented in the latest
version of the patch on sourceforge.

One of the big improvements was speeding up failed lookups by adding
negative entries to "positively" identify keys not in the dictionary
(me_value == NULL, me_key != NULL, != dummy). This considerably speeds 
up the global/builtin lookup chain. Management of these entries is 
somewhere between limited to nonexistent. New entries are simply not 
added if there is not enough free space the table, resizing a dictionary 
with negative entries is probably buggy. Negative entries should also 
help instance/class/superclass lookup chain but it segfaults if used 
for anything other than LOAD_NAME and LOAD_GLOBAL so there must be other
bugs lurking there.

Inlining works really fast - but only if the entry is found in the first 
hash probe. I experimented with dynamically shuffling the entries to
ensure that the entry accessed most frequently would be first >90% of
the time instead of ~%75 of the time. This is not in the patch and I
don't have code that is anywhere near usable.

Interning - now that interned strings are not immortal this patch could
do be much more aggressive about interning everything. Dictionaries 
used as namespaces could be guaranteed to have only interned strings as 
entry keys and avoid the need for a second search pass on first failed 
lookup (before the negative entry is added). This is important because
many small temporary objects have attributes that are accessed only
once. Currently this first access is made slower by the patch instead
of faster.

The last strategy I was going to try before my free time shrunk
considerably (a job found me) was to create a new species of dictionary
in addition to "lookdict" and "lookdict_string". Dictionaries will start
their life as either "lookdict_string" or "lookdict_namespace",
depending on their expected usage. This is just an optimization hint -
the semantics are otherwise identical. A "lookdict_namespace" dictionary
interns all keys and adds negative entries for failed lookups. This
overhead pays when it's used as a namespace. Both "lookdict_namespace"
and "lookdict_string" will fall back to a "lookdict" dictionary when a 
non-string key is inserted.

Optimizing or inlining setitem for the common case where the entry 
already exists and is only set to a new value will also be nice.

    Oren





More information about the Python-list mailing list