[Python-checkins] peps: PEP 509: make the version private

victor.stinner python-checkins at python.org
Mon Jan 11 10:16:57 EST 2016


https://hg.python.org/peps/rev/42b276dc6a9b
changeset:   6160:42b276dc6a9b
user:        Victor Stinner <victor.stinner at gmail.com>
date:        Mon Jan 11 16:16:46 2016 +0100
summary:
  PEP 509: make the version private

* reorganize sections
* add microbenchmarks numbers
* rationale to not expose the version at Python level

files:
  pep-0509.txt |  356 ++++++++++++++++++++++----------------
  1 files changed, 204 insertions(+), 152 deletions(-)


diff --git a/pep-0509.txt b/pep-0509.txt
--- a/pep-0509.txt
+++ b/pep-0509.txt
@@ -1,5 +1,5 @@
 PEP: 509
-Title: Add dict.__version__
+Title: Add a private version to dict
 Version: $Revision$
 Last-Modified: $Date$
 Author: Victor Stinner <victor.stinner at gmail.com>
@@ -13,8 +13,8 @@
 Abstract
 ========
 
-Add a new read-only ``__version__`` property to ``dict`` and
-``collections.UserDict`` types, incremented at each change.
+Add a new private version to builtin ``dict`` type, incremented at each
+change, to implement fast guards on namespaces.
 
 
 Rationale
@@ -34,13 +34,19 @@
 these checks "guards".
 
 The speedup of optimizations depends on the speed of guard checks. This
-PEP proposes to add a version to dictionaries to implement efficient
-guards on namespaces.
+PEP proposes to add a version to dictionaries to implement fast guards
+on namespaces.
 
-Example of optimization: replace loading a global variable with a
-constant.  This optimization requires a guard on the global variable to
+Dictionary lookups can be skipped if the version does not change which
+is the common case for most namespaces. The performance of a guard does
+not depend on the number of watched dictionary entries, complexity of
+O(1), if the dictionary version does not change.
+
+Example of optimization: copy the value of a global variable to function
+constants.  This optimization requires a guard on the global variable to
 check if it was modified. If the variable is modified, the variable must
-be loaded at runtime, instead of using the constant.
+be loaded at runtime when the function is called, instead of using the
+constant.
 
 See the `PEP 510 -- Specialized functions with guards
 <https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of
@@ -51,21 +57,22 @@
 Guard example
 =============
 
-Pseudo-code of an efficient guard to check if a dictionary key was
-modified (created, updated or deleted)::
+Pseudo-code of an fast guard to check if a dictionary key was modified
+(created, updated or deleted) using an hypothetical
+``get_dict_version(dict)`` function::
 
     UNSET = object()
 
-    class Guard:
+    class GuardDictKey:
         def __init__(self, dict, key):
             self.dict = dict
             self.key = key
             self.value = dict.get(key, UNSET)
-            self.version = dict.__version__
+            self.version = get_dict_version(dict)
 
         def check(self):
             """Return True if the dictionary value did not changed."""
-            version = self.dict.__version__
+            version = get_dict_version(self.dict)
             if version == self.version:
                 # Fast-path: avoid the dictionary lookup
                 return True
@@ -80,100 +87,21 @@
             return False
 
 
-Changes
-=======
-
-Add a read-only ``__version__`` property to builtin ``dict`` type and to
-the ``collections.UserDict`` type. New empty dictionaries are initilized
-to version ``0``. The version is incremented at each change:
-
-* ``clear()`` if the dict was non-empty
-* ``pop(key)`` if the key exists
-* ``popitem()`` if the dict is non-empty
-* ``setdefault(key, value)`` if the `key` does not exist
-* ``__detitem__(key)`` if the key exists
-* ``__setitem__(key, value)`` if the `key` doesn't exist or if the value
-  is different
-* ``update(...)`` if new values are different than existing values (the
-  version can be incremented multiple times)
-
-Example::
-
-    >>> d = {}
-    >>> d.__version__
-    0
-    >>> d['key'] = 'value'
-    >>> d.__version__
-    1
-    >>> d['key'] = 'new value'
-    >>> d.__version__
-    2
-    >>> del d['key']
-    >>> d.__version__
-    3
-
-If a dictionary is created with items, the version is also incremented
-at each dictionary insertion. Example::
-
-    >>> d=dict(x=7, y=33)
-    >>> d.__version__
-    2
-
-The version is not incremented is an existing key is modified to the
-same value, but only the identifier of the value is tested, not the
-content of the value. Example::
-
-    >>> d={}
-    >>> value = object()
-    >>> d['key'] = value
-    >>> d.__version__
-    2
-    >>> d['key'] = value
-    >>> d.__version__
-    2
-
-.. note::
-   CPython uses some singleton like integers in the range [-5; 257],
-   empty tuple, empty strings, Unicode strings of a single character in
-   the range [U+0000; U+00FF], etc. When a key is set twice to the same
-   singleton, the version is not modified.
-
-The PEP is designed to implement guards on namespaces, only the ``dict``
-type can be used for namespaces in practice.  ``collections.UserDict``
-is modified because it must mimicks ``dict``. ``collections.Mapping`` is
-unchanged.
-
-
-Integer overflow
-================
-
-The implementation uses the C unsigned integer type ``size_t`` to store
-the version.  On 32-bit systems, the maximum version is ``2**32-1``
-(more than ``4.2 * 10 ** 9``, 4 billions). On 64-bit systems, the maximum
-version is ``2**64-1`` (more than ``1.8 * 10**19``).
-
-The C code uses ``version++``. On integer overflow, the version is
-wrapped to ``0`` (and then continue to be incremented).
-
-The check ``dict.__version__ == old_version`` can be true after an
-integer overflow, so a guard can return false even if the value changed,
-which is wrong. The bug occurs if the dict is modified at least ``2**64``
-times (on 64-bit system) between two checks of the guard.
-
-Using a more complex type (ex: ``PyLongObject``) to avoid the overflow
-would slow down operations on the ``dict`` type. Even if there is a
-theorical risk of missing a value change, the risk is considered too low
-compared to the slow down of using a more complex type.
-
-
-Usage of dict.__version__
+Usage of the dict version
 =========================
 
-astoptimizer of FAT Python
---------------------------
+Specialized functions using guards
+----------------------------------
 
-The astoptimizer of the FAT Python project implements many optimizations
-which require guards on namespaces. Examples:
+The `PEP 510 -- Specialized functions with guards
+<https://www.python.org/dev/peps/pep-0510/>`_ proposes an API to support
+specialized functions with guards. It allows to implement static
+optimizers for Python without breaking the Python semantics.
+
+Example of a static Python optimizer: the astoptimizer of the `FAT
+Python <http://faster-cpython.readthedocs.org/fat_python.html>`_ project
+implements many optimizations which require guards on namespaces.
+Examples:
 
 * Call pure builtins: to replace ``len("abc")`` with ``3``, guards on
   ``builtins.__dict__['len']`` and ``globals()['len']`` are required
@@ -181,10 +109,6 @@
   guards on ``builtins.__dict__['range']`` and ``globals()['range']``
   are required
 
-The `FAT Python
-<http://faster-cpython.readthedocs.org/fat_python.html>`_ project is a
-static optimizer for Python 3.6.
-
 
 Pyjion
 ------
@@ -210,21 +134,143 @@
 <http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html>`_.
 
 
+Changes
+=======
+
+Add a ``PY_INT64_T ma_version`` field to the ``PyDictObject`` structure:
+64-bit unsigned integer. New empty dictionaries are initilized to
+version ``0``. The version is incremented at each change:
+
+* ``clear()`` if the dict was non-empty
+* ``pop(key)`` if the key exists
+* ``popitem()`` if the dict is non-empty
+* ``setdefault(key, value)`` if the `key` does not exist
+* ``__detitem__(key)`` if the key exists
+* ``__setitem__(key, value)`` if the `key` doesn't exist or if the value
+  is different
+* ``update(...)`` if new values are different than existing values (the
+  version can be incremented multiple times)
+
+Example using an hypothetical ``get_dict_version(dict)`` function::
+
+    >>> d = {}
+    >>> get_dict_version(d)
+    0
+    >>> d['key'] = 'value'
+    >>> get_dict_version(d)
+    1
+    >>> d['key'] = 'new value'
+    >>> get_dict_version(d)
+    2
+    >>> del d['key']
+    >>> get_dict_version(d)
+    3
+
+If a dictionary is created with items, the version is also incremented
+at each dictionary insertion. Example::
+
+    >>> d=dict(x=7, y=33)
+    >>> get_dict_version(d)
+    2
+
+The version is not incremented is an existing key is modified to the
+same value, but only the identifier of the value is tested, not the
+content of the value. Example::
+
+    >>> d={}
+    >>> value = object()
+    >>> d['key'] = value
+    >>> get_dict_version(d)
+    2
+    >>> d['key'] = value
+    >>> get_dict_version(d)
+    2
+
+.. note::
+   CPython uses some singleton like integers in the range [-5; 257],
+   empty tuple, empty strings, Unicode strings of a single character in
+   the range [U+0000; U+00FF], etc. When a key is set twice to the same
+   singleton, the version is not modified.
+
+
 Implementation
 ==============
 
-See the `issue #26058: Add dict.__version__ read-only property
-<https://bugs.python.org/issue26058>`_.
+The `issue #26058: PEP 509: Add ma_version to PyDictObject
+<https://bugs.python.org/issue26058>`_ contains a patch implementing
+this PEP.
 
 On pybench and timeit microbenchmarks, the patch does not seem to add
 any overhead on dictionary operations.
 
+When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for
+a dictioanry lookup, whereas a guard check only takes 3.8 ns. Moreover,
+a guard can watch multiple keys. For example, for an optimization using
+10 global variables in a function, the check costs 148 ns for 10 dict
+lookups, whereas the guard still only cost 3.8 ns when the version does
+not change (39x as fast).
+
+
+Integer overflow
+================
+
+The implementation uses the C unsigned integer type ``PY_INT64_T`` to
+store the version, a 64 bits unsigned integer. The C code uses
+``version++``. On integer overflow, the version is wrapped to ``0`` (and
+then continue to be incremented) according to the C standard.
+
+After an integer overflow, a guard can succeed whereas the watched
+dictionary key was modified. The bug occurs if the dictionary is
+modified at least ``2 ** 64`` times between two checks of the guard and
+if the new version (theorical value with no integer overflow) is equal
+to the old version modulo ``2 ** 64``.
+
+If a dictionary is modified each nanosecond, an overflow takes longer
+than 584 years. Using a 32-bit version, the overflow occurs only after 4
+seconds. That's why a 64-bit unsigned type is also used on 32-bit
+systems.
+
+A risk of a bug every 584 years is acceptable.
+
 
 Alternatives
 ============
 
-Bikeshedding on the property name
----------------------------------
+Expose the version at Python level as a read-only __version__ property
+----------------------------------------------------------------------
+
+The first version of the PEP proposed to expose the dictionary version
+as a read-only ``__version__`` property at Python level, and also to add
+the property to ``collections.UserDict`` (since this type must mimick
+the ``dict`` API).
+
+There are multiple issues:
+
+* To be consistent and avoid bad surprises, the version must be added to
+  all mapping type. Implementing a new mapping type would require extra
+  work for no benefit, since the version is only required on the
+  ``dict`` type in practice.
+* All Python implementations must implement this new property, it gives
+  more work to other implementations, whereas they may not use the
+  dictionary version at all.
+* The ``__version__`` can be wrapped on integer overflow. It is error
+  prone: using ``dict.__version__ <= guard_version`` is wrong,
+  ``dict.__version__ == guard_version`` must be used instead to reduce
+  the risk of bug on integer overflow (even if the integer overflow is
+  unlikely in practice).
+* Exposing the dictioanry version can lead the
+  false assumption on performances. Checking ``dict.__version__`` at
+  the Python level is not faster than a dictionary lookup. The lookup
+  has a cost of 48.7 ns and checking a guard has a cost of 47.5 ns, the
+  difference is only 1.2 ns (3%)::
+
+
+    $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33'
+    10000000 loops, best of 3: 0.0487 usec per loop
+    $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100'
+    10000000 loops, best of 3: 0.0475 usec per loop
+
+Bikeshedding on the property name:
 
 * ``__cache_token__``: name proposed by Nick Coghlan, name coming from
   `abc.get_cache_token()
@@ -233,13 +279,6 @@
 * ``__timestamp__``
 
 
-Add a version but don't expose it at the Python level
------------------------------------------------------
-
-Guards can be implemented in C to access directly the ``ma_version``
-field of the ``PyDictObject`` structure.
-
-
 Add a version to each dict entry
 --------------------------------
 
@@ -254,8 +293,8 @@
 the entry version is set to the dictionary version which is incremented
 at each change (create, modify, delete).
 
-Pseudo-code of an efficient guard to check if a dict key was modified
-using ``getversion()``::
+Pseudo-code of an fast guard to check if a dict key was modified using
+``getversion()``::
 
     UNSET = object()
 
@@ -263,12 +302,12 @@
         def __init__(self, dict, key):
             self.dict = dict
             self.key = key
-            self.dict_version = dict.__version__
+            self.dict_version = get_dict_version(dict)
             self.entry_version = dict.getversion(key)
 
         def check(self):
             """Return True if the dictionary value did not changed."""
-            dict_version = self.dict.__version__
+            dict_version = get_dict_version(self.dict)
             if dict_version == self.version:
                 # Fast-path: avoid the dictionary lookup
                 return True
@@ -366,42 +405,55 @@
 On integer overflow, the whole cache is cleared and the global version
 tag is reset to ``0``.
 
-See also `issue #1685986: Method cache
-<https://bugs.python.org/issue1685986>`_ and `issue #1700288: Armin's
-method cache optimization updated for Python 2.6
+See `Method cache (issue #1685986)
+<https://bugs.python.org/issue1685986>`_ and `Armin's method cache
+optimization updated for Python 2.6 (issue #1700288)
 <https://bugs.python.org/issue1700288>`_.
 
 
+Globals / builtins cache
+------------------------
+
+In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue
+#10401) <http://bugs.python.org/issue10401>`_ which adds a private
+``ma_version`` field to the ``PyDictObject`` structure (``dict`` type),
+the field has the C type ``Py_ssize_t``.
+
+The patch adds a "global and builtin cache" to functions and frames, and
+changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the
+cache.
+
+The change on the ``PyDictObject`` structure is very similar to this
+PEP.
+
+
+Cached globals+builtins lookup
+------------------------------
+
+In 2006, Andrea Griffini proposed a patch implementing a `Cached
+globals+builtins lookup optimization
+<https://bugs.python.org/issue1616125>`_.  The patch adds a private
+``timestamp`` field to the ``PyDictObject`` structure (``dict`` type),
+the field has the C type ``size_t``.
+
+Thread on python-dev: `About dictionary lookup caching
+<https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_.
+
+
 Guard against changing dict during iteration
 --------------------------------------------
 
-In 2013, Serhiy Storchaka proposed a patch for the `issue #19332: Guard
-against changing dict during iteration
-<https://bugs.python.org/issue19332>`_ (mentioned above) which adds a
-``size_t ma_count`` field to the ``PyDictObject`` structure. This field
-is incremented when the dictionary is modified, and so is very similar
-to the proposed ``dict.__version__``.
+In 2013, Serhiy Storchaka proposed `Guard against changing dict during
+iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which
+adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict``
+type), the field has the C type ``size_t``.  This field is incremented
+when the dictionary is modified, and so is very similar to the proposed
+dictionary version.
 
-
-Cached globals+builtins lookup
-------------------------------
-
-In 2006, Andrea Griffini proposes a patch implementing a `Cached
-globals+builtins lookup optimization <https://bugs.python.org/issue1616125>`_.
-The patch adds a private ``timestamp`` field to dict.
-
-See the thread on python-dev: `About dictionary lookup caching
-<https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_.
-
-
-Globals / builtins cache
-------------------------
-
-In 2010, Antoine Pitrou proposed a `Globals / builtins cache
-<http://bugs.python.org/issue10401>`_ which adds a private
-``ma_version`` field to the ``dict`` type. The patch adds a "global and
-builtin cache" to functions and frames, and changes ``LOAD_GLOBAL`` and
-``STORE_GLOBAL`` instructions to use the cache.
+Sadly, the dictionary version proposed in this PEP doesn't help to
+detect dictionary mutation. The dictionary version changes when values
+are replaced, whereas modifying dictionary values while iterating on
+dictionary keys is legit in Python.
 
 
 PySizer

-- 
Repository URL: https://hg.python.org/peps


More information about the Python-checkins mailing list