[Patches] [Patch #101309] Specialization of dictionaries for strings.

Wed, 30 Aug 2000 22:28:52 -0700

Patch #101309 has been updated. 

Project: 
Category: core (C code)
Status: Open
Summary: Specialization of dictionaries for strings.

Follow-Ups:

Date: 2000-Aug-25 22:26
By: fdrake

Comment:
This patch introduces some changes to the builtin dictionary type.
These changes create a specialization of the type to better support
efficient use of dictionaries as a runtime implementation detail.

This patch causes all newly created dictionaries to be specialized for
string keys; dictionaries are converted (trivially) to handle
arbitrary keys as needed.  If a non-string is used as a key (even if
only doing a lookup and it fails), the dictionary is converted to a
normal dictionary.  To achieve faster performance, a specialized
version of the lookdict() function is used.  The specialized
implementation can assume two things that the "normal" implementation
cannot:

  - that all comparisons are always done using the
    PyString_Type.tp_compare function, avoiding some of the overhead
    of calling PyObject_Compare() and dereferencing the comparison
    function repeatedly, and

  - that the comparison function never raises an exception, since two
    strings can always be compared without an error.

Some debugging code added to dictobject.c shows that most dictionaries
created during the regression test are never converted to "normal"
dictionaries, but always only use strings as keys:

	created 115874 string dicts
	converted 2658 to normal dicts
	2.29% conversion rate

Additional performance tests should be made.  I expect that a patch
such as this is most valuable if additional code is needed to
robustify lookdict() to properly deal with exceptions from
user-defined comparison functions (tp_compare handlers of __cmp__()
functions in Python code).  The patch duplicates the logic of the
lookdict() function, which makes maintenance more tedious.

-------------------------------------------------------

Date: 2000-Aug-30 12:27
By: marangoz

Comment:
I doubt we'll gain something from this, if anything at all. Some more stats:

for pystone:

1052981 total dict lookups, of which
2182 (0.21%) hash collisions/calls to cmp
0 (0.00%) failed object comparisons

for the regression test suite:

7266955 total dict lookups, of which
580978 (7.99%) hash collisions/calls to cmp
0 (0.00%) failed object comparisons

Postponed for further investigation. There's an idea in the lazy, runtime specialization of the lookup algorithm!
-------------------------------------------------------

Date: 2000-Aug-30 12:30
By: gvanrossum

Comment:
Guido needs to decide whether this is needed to make the other dict patch safe when tstate==NULL.
-------------------------------------------------------

Date: 2000-Aug-30 12:46
By: marangoz

Comment:
Looks like the other dict patch needs to use PyErr_xxx calls inside/after
PyThreadState_GET() != NULL checks. if tstate==NULL, the PyErr logic
must be skipped (this is an orphan state or a temporary state on early initialization & late finalization -- need to dbl check this once again).
-------------------------------------------------------

Date: 2000-Aug-30 22:28
By: fdrake

Comment:
Changed this a little:

- counting code that determines how often dictionaries are "degraded" is conditionalized and not included by default.

- Instead of calling PyObject_Compare() in lookdict_string(), directly call the string type's tp_compare handler.  This saves the various tests done in PyObject_Compare(), as well as the creation of a few C call frames.
-------------------------------------------------------

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=101309&group_id=5470