[Python-checkins] peps: Copy PEP text from revision 6c4d5d9dfc6d.

martin.v.loewis python-checkins at python.org
Thu Feb 16 13:05:01 CET 2012


http://hg.python.org/peps/rev/a54dd98146f7
changeset:   4058:a54dd98146f7
user:        Martin v. Loewis <martin at v.loewis.de>
date:        Thu Feb 16 13:04:56 2012 +0100
summary:
  Copy PEP text from revision 6c4d5d9dfc6d.

files:
  pep-0412.txt |  162 +++++++++++++++++++++++++++++++++++++++
  1 files changed, 162 insertions(+), 0 deletions(-)


diff --git a/pep-0412.txt b/pep-0412.txt
new file mode 100644
--- /dev/null
+++ b/pep-0412.txt
@@ -0,0 +1,162 @@
+PEP: 412
+Title: Key-Sharing Dictionary
+Version: $Revision$
+Last-Modified: $Date$
+Author: Mark Shannon <mark at hotpy.org>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 08-Feb-2012
+Python-Version: 3.3 or 3.4
+Post-History: 08-Feb-2012
+
+
+Abstract
+========
+
+This PEP proposes a change in the implementation of the builtin dictionary
+type ``dict``. The new implementation allows dictionaries which are used as
+attribute dictionaries (the ``__dict__`` attribute of an object) to share
+keys with other attribute dictionaries of instances of the same class.
+
+Motivation
+==========
+
+The current dictionary implementation uses more memory than is necessary
+when used as a container for object attributes as the keys are
+replicated for each instance rather than being shared across many instances
+of the same class.
+Despite this, the current dictionary implementation is finely tuned and
+performs very well as a general-purpose mapping object.
+
+By separating the keys (and hashes) from the values it is possible to share
+the keys between multiple dictionaries and improve memory use.
+By ensuring that keys are separated from the values only when beneficial,
+it is possible to retain the high-performance of the current dictionary
+implementation when used as a general-purpose mapping object.
+
+Behaviour
+=========
+
+The new dictionary behaves in the same way as the old implementation.
+It fully conforms to the Python API, the C API and the ABI.
+
+Performance
+===========
+
+Memory Usage
+------------
+
+Reduction in memory use is directly related to the number of dictionaries
+with shared keys in existence at any time. These dictionaries are typically
+half the size of the current dictionary implementation.
+
+Benchmarking shows that memory use is reduced by 10% to 20% for
+object-oriented programs with no significant change in memory use
+for other programs.
+
+Speed
+-----
+
+The performance of the new implementation is dominated by memory locality
+effects. When keys are not shared (for example in module dictionaries
+and dictionary explicitly created by dict() or {} ) then performance is
+unchanged (within a percent or two) from the current implementation.
+
+For the shared keys case, the new implementation tends to separate keys
+from values, but reduces total memory usage. This will improve performance
+in many cases as the effects of reduced memory usage outweigh the loss of
+locality, but some programs may show a small slow down.
+
+Benchmarking shows no significant change of speed for most benchmarks.
+Object-oriented benchmarks show small speed ups when they create large
+numbers of objects of the same class (the gcbench benchmark shows a 10%
+speed up; this is likely to be an upper limit).
+
+Implementation
+==============
+
+Both the old and new dictionaries consist of a fixed-sized dict struct and
+a re-sizeable table.
+In the new dictionary the table can be further split into a keys table and
+values array.
+The keys table holds the keys and hashes and (for non-split tables) the
+values as well. It differs only from the original implementation in that it
+contains a number of fields that were previously in the dict struct.
+If a table is split the values in the keys table are ignored, instead the
+values are held in a separate array.
+
+Split-Table dictionaries
+------------------------
+
+When dictionaries are created to fill the __dict__ slot of an object, they are
+created in split form. The keys table is cached in the type, potentially
+allowing all attribute dictionaries of instances of one class to share keys.
+In the event of the keys of these dictionaries starting to diverge,
+individual dictionaries will lazily convert to the combined-table form.
+This ensures good memory use in the common case, and correctness in all cases.
+
+When resizing a split dictionary it is converted to a combined table.
+If resizing is as a result of storing an instance attribute, and there is
+only instance of a class, then the dictionary will be re-split immediately.
+Since most OO code will set attributes in the __init__ method, all attributes
+will be set before a second instance is created and no more resizing will be
+necessary as all further instance dictionaries will have the correct size.
+For more complex use patterns, it is impossible to know what is the best
+approach, so the implementation allows extra insertions up to the point
+of a resize when it reverts to the combined table (non-shared keys).
+
+A deletion from a split dictionary does not change the keys table, it simply
+removes the value from the values array.
+
+Combined-Table dictionaries
+---------------------------
+
+Explicit dictionaries (dict() or {}), module dictionaries and most other
+dictionaries are created as combined-table dictionaries.
+A combined-table dictionary never becomes a split-table dictionary.
+Combined tables are laid out in much the same way as the tables in the old
+dictionary, resulting in very similar performance.
+
+Implementation
+==============
+
+The new dictionary implementation is available at [1]_.
+
+Pros and Cons
+=============
+
+Pros
+----
+
+Significant memory savings for object-oriented applications.
+Small improvement to speed for programs which create lots of similar objects.
+
+Cons
+----
+
+Change to data structures:
+Third party modules which meddle with the internals of the dictionary
+implementation will break.
+Changes to repr() output and iteration order:
+For most cases, this will be unchanged.
+However for some split-table dictionaries the iteration order will
+change.
+
+Neither of these cons should be a problem.
+Modules which meddle with the internals of the dictionary
+implementation are already broken and should be fixed to use the API.
+The iteration order of dictionaries was never defined and has always been
+arbitrary; it is different for Jython and PyPy.
+
+References
+==========
+
+.. [1] Reference Implementation:
+   https://bitbucket.org/markshannon/cpython_new_dict
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+

-- 
Repository URL: http://hg.python.org/peps


More information about the Python-checkins mailing list