PEP 509: Add a private version to dict (version 3)
![](https://secure.gravatar.com/avatar/daa45563a98419bb1b6b63904ce71f95.jpg?s=120&d=mm&r=g)
Hi, Below if the third version of my PEP 509 (dict version). Changes since the version 2: * __setitem__() and update() now always increases the version: remove the micro-optimization on "dict[key] is new_value". Exception: version is not changed with dict.update() is called without argument. * be more explict on version++: explain that the operation must be atomic, and that dict methods are already atomic thanks to the GIL * Usage of the dict version: add Cython * "Guard against changing dict during iteration": don't guess if the new dict version can be used or not. Let's discuss that later. * rephrase/complete some sections * add links to new threads on python-dev I hope that I addressed all Jim's concerns about the version 2. Note: I also updated the implementation. The implementation now contains more tests for identical values and more tests on equal values. HTML version: https://www.python.org/dev/peps/pep-0509/ PEP: 509 Title: Add a private version to dict Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner <victor.stinner@gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 4-January-2016 Python-Version: 3.6 Abstract ======== Add a new private version to the builtin ``dict`` type, incremented at each dictionary creation and at each dictionary change, to implement fast guards on namespaces. Rationale ========= In Python, the builtin ``dict`` type is used by many instructions. For example, the ``LOAD_GLOBAL`` instruction looks up a variable in the global namespace, or in the builtins namespace (two dict lookups). Python uses ``dict`` for the builtins namespace, globals namespace, type namespaces, instance namespaces, etc. The local namespace (function namespace) is usually optimized to an array, but it can be a dict too. Python is hard to optimize because almost everything is mutable: builtin functions, function code, global variables, local variables, ... can be modified at runtime. Implementing optimizations respecting the Python semantics requires to detect when "something changes": we will call these checks "guards". The speedup of optimizations depends on the speed of guard checks. This PEP proposes to add a private version to dictionaries to implement fast guards on namespaces. Dictionary lookups can be skipped if the version does not change which is the common case for most namespaces. The version is globally unique, so checking the version is also enough to check if the namespace dictionary was not replaced with a new dictionary. When the dictionary version does not change, the performance of a guard does not depend on the number of watched dictionary entries: the complexity is O(1). Example of optimization: copy the value of a global variable to function constants. This optimization requires a guard on the global variable to check if it was modified. If the global variable is not modified, the function uses the cached copy. If the global variable is modified, the function uses a regular lookup, and maybe also deoptimize the function (to remove the overhead of the guard check for next function calls). See the `PEP 510 -- Specialized functions with guards <https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of guards to specialize functions and for a more general rationale on Python static optimizers. Guard example ============= Pseudo-code of an fast guard to check if a dictionary entry was modified (created, updated or deleted) using an hypothetical ``dict_get_version(dict)`` function:: UNSET = object() class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.value = dict.get(key, UNSET) self.version = dict_get_version(dict) def check(self): """Return True if the dictionary entry did not change and the dictionary was not replaced.""" # read the version of the dictionary version = dict_get_version(self.dict) if version == self.version: # Fast-path: dictionary lookup avoided return True # lookup in the dictionary value = self.dict.get(self.key, UNSET) if value is self.value: # another key was modified: # cache the new dictionary version self.version = version return True # the key was modified return False Usage of the dict version ========================= Speedup method calls -------------------- Yury Selivanov wrote a `patch to optimize method calls <https://bugs.python.org/issue26110>`_. The patch depends on the `"implement per-opcode cache in ceval" <https://bugs.python.org/issue26219>`_ patch which requires dictionary versions to invalidate the cache if the globals dictionary or the builtins dictionary has been modified. The cache also requires that the dictionary version is globally unique. It is possible to define a function in a namespace and call it in a different namespace, using ``exec()`` with the *globals* parameter for example. In this case, the globals dictionary was replaced and the cache must also be invalidated. Specialized functions using guards ---------------------------------- The `PEP 510 -- Specialized functions with guards <https://www.python.org/dev/peps/pep-0510/>`_ proposes an API to support specialized functions with guards. It allows to implement static optimizers for Python without breaking the Python semantics. The `fatoptimizer <http://fatoptimizer.readthedocs.org/>`_ of the `FAT Python <http://faster-cpython.readthedocs.org/fat_python.html>`_ project is an example of a static Python optimizer. It implements many optimizations which require guards on namespaces: * Call pure builtins: to replace ``len("abc")`` with ``3``, guards on ``builtins.__dict__['len']`` and ``globals()['len']`` are required * Loop unrolling: to unroll the loop ``for i in range(...): ...``, guards on ``builtins.__dict__['range']`` and ``globals()['range']`` are required * etc. Pyjion ------ According of Brett Cannon, one of the two main developers of Pyjion, Pyjion can benefit from dictionary version to implement optimizations. `Pyjion <https://github.com/Microsoft/Pyjion>`_ is a JIT compiler for Python based upon CoreCLR (Microsoft .NET Core runtime). Cython ------ Cython can benefit from dictionary version to implement optimizations. `Cython <http://cython.org/>`_ is an optimising static compiler for both the Python programming language and the extended Cython programming language. Unladen Swallow --------------- Even if dictionary version was not explicitly mentioned, optimizing globals and builtins lookup was part of the Unladen Swallow plan: "Implement one of the several proposed schemes for speeding lookups of globals and builtins." (source: `Unladen Swallow ProjectPlan <https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_). Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler implemented with LLVM. The project stopped in 2011: `Unladen Swallow Retrospective <http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html>`_. Changes ======= Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with the C type ``PY_UINT64_T``, 64-bit unsigned integer. Add also a global dictionary version. Each time a dictionary is created, the global version is incremented and the dictionary version is initialized to the global version. The global version is also incremented and copied to the dictionary version at each dictionary change: * ``clear()`` if the dict is non-empty * ``pop(key)`` if the key exists * ``popitem()`` if the dict is non-empty * ``setdefault(key, value)`` if the key does not exist * ``__delitem__(key)`` if the key exists * ``__setitem__(key, value)`` always increases the version * ``update(...)`` if called with arguments The version increase must be atomic. In CPython, the Global Interpreter Lock (GIL) already protects ``dict`` methods to make them atomic. Example using an hypothetical ``dict_get_version(dict)`` function:: >>> d = {} >>> dict_get_version(d) 100 >>> d['key'] = 'value' >>> dict_get_version(d) 101 >>> d['key'] = 'new value' >>> dict_get_version(d) 102 >>> del d['key'] >>> dict_get_version(d) 103 ``dict.__setitem__(key, value)`` and ``dict.update(...)`` always increases the version, even if the new value is identical or is equal to the current value (even if ``(dict[key] is value) or (dict[key] == value)``). The field is called ``ma_version_tag``, rather than ``ma_version``, to suggest to compare it using ``version_tag == old_version_tag``, rather than ``version <= old_version`` which is wrong most of the time after an integer overflow. Backwards Compatibility ======================= Since the ``PyDictObject`` structure is not part of the stable ABI and the new dictionary version not exposed at the Python scope, changes are backward compatible. Implementation and Performance ============================== The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject <https://bugs.python.org/issue26058>`_ contains a patch implementing this PEP. On pybench and timeit microbenchmarks, the patch does not seem to add any overhead on dictionary operations. For example, the following timeit micro-benchmarks takes 318 nanoseconds before and after the change:: python3.6 -m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()' When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover, a guard can watch for multiple keys. For example, for an optimization using 10 global variables in a function, 10 dictionary lookups costs 148 ns, whereas the guard still only costs 3.8 ns when the version does not change (39x as fast). The `fat module <http://fatoptimizer.readthedocs.org/en/latest/fat.html>`_ implements such guards: ``fat.GuardDict`` is based on the dictionary version. Integer overflow ================ The implementation uses the C type ``PY_UINT64_T`` to store the version: a 64 bits unsigned integer. The C code uses ``version++``. On integer overflow, the version is wrapped to ``0`` (and then continue to be incremented) according to the C standard. After an integer overflow, a guard can succeed whereas the watched dictionary key was modified. The bug only occurs at a guard check if there are exaclty ``2 ** 64`` dictionary creations or modifications since the previous guard check. If a dictionary is modified every nanosecond, ``2 ** 64`` modifications takes longer than 584 years. Using a 32-bit version, it only takes 4 seconds. That's why a 64-bit unsigned type is also used on 32-bit systems. A dictionary lookup at the C level takes 14.8 ns. A risk of a bug every 584 years is acceptable. Alternatives ============ Expose the version at Python level as a read-only __version__ property ---------------------------------------------------------------------- The first version of the PEP proposed to expose the dictionary version as a read-only ``__version__`` property at Python level, and also to add the property to ``collections.UserDict`` (since this type must mimick the ``dict`` API). There are multiple issues: * To be consistent and avoid bad surprises, the version must be added to all mapping types. Implementing a new mapping type would require extra work for no benefit, since the version is only required on the ``dict`` type in practice. * All Python implementations would have to implement this new property, it gives more work to other implementations, whereas they may not use the dictionary version at all. * Exposing the dictionary version at the Python level can lead the false assumption on performances. Checking ``dict.__version__`` at the Python level is not faster than a dictionary lookup. A dictionary lookup in Python has a cost of 48.7 ns and checking the version has a cost of 47.5 ns, the difference is only 1.2 ns (3%):: $ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33' 10000000 loops, best of 3: 0.0487 usec per loop $ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100' 10000000 loops, best of 3: 0.0475 usec per loop * The ``__version__`` can be wrapped on integer overflow. It is error prone: using ``dict.__version__ <= guard_version`` is wrong, ``dict.__version__ == guard_version`` must be used instead to reduce the risk of bug on integer overflow (even if the integer overflow is unlikely in practice). Mandatory bikeshedding on the property name: * ``__cache_token__``: name proposed by Nick Coghlan, name coming from `abc.get_cache_token() <https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_. * ``__version__`` * ``__version_tag__`` * ``__timestamp__`` Add a version to each dict entry -------------------------------- A single version per dictionary requires to keep a strong reference to the value which can keep the value alive longer than expected. If we add also a version per dictionary entry, the guard can only store the entry version (a simple integer) to avoid the strong reference to the value: only strong references to the dictionary and to the key are needed. Changes: add a ``me_version_tag`` field to the ``PyDictKeyEntry`` structure, the field has the C type ``PY_UINT64_T``. When a key is created or modified, the entry version is set to the dictionary version which is incremented at any change (create, modify, delete). Pseudo-code of an fast guard to check if a dictionary key was modified using hypothetical ``dict_get_version(dict)`` and ``dict_get_entry_version(dict)`` functions:: UNSET = object() class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.dict_version = dict_get_version(dict) self.entry_version = dict_get_entry_version(dict, key) def check(self): """Return True if the dictionary entry did not change and the dictionary was not replaced.""" # read the version of the dictionary dict_version = dict_get_version(self.dict) if dict_version == self.version: # Fast-path: dictionary lookup avoided return True # lookup in the dictionary to read the entry version entry_version = get_dict_key_version(dict, key) if entry_version == self.entry_version: # another key was modified: # cache the new dictionary version self.dict_version = dict_version self.entry_version = entry_version return True # the key was modified return False The main drawback of this option is the impact on the memory footprint. It increases the size of each dictionary entry, so the overhead depends on the number of buckets (dictionary entries, used or not used). For example, it increases the size of each dictionary entry by 8 bytes on 64-bit system. In Python, the memory footprint matters and the trend is to reduce it. Examples: * `PEP 393 -- Flexible String Representation <https://www.python.org/dev/peps/pep-0393/>`_ * `PEP 412 -- Key-Sharing Dictionary <https://www.python.org/dev/peps/pep-0412/>`_ Add a new dict subtype ---------------------- Add a new ``verdict`` type, subtype of ``dict``. When guards are needed, use the ``verdict`` for namespaces (module namespace, type namespace, instance namespace, etc.) instead of ``dict``. Leave the ``dict`` type unchanged to not add any overhead (CPU, memory footprint) when guards are not used. Technical issue: a lot of C code in the wild, including CPython core, expecting the exact ``dict`` type. Issues: * ``exec()`` requires a ``dict`` for globals and locals. A lot of code use ``globals={}``. It is not possible to cast the ``dict`` to a ``dict`` subtype because the caller expects the ``globals`` parameter to be modified (``dict`` is mutable). * C functions call directly ``PyDict_xxx()`` functions, instead of calling ``PyObject_xxx()`` if the object is a ``dict`` subtype * ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some functions require the exact ``dict`` type. * ``Python/ceval.c`` does not completely supports dict subtypes for namespaces The ``exec()`` issue is a blocker issue. Other issues: * The garbage collector has a special code to "untrack" ``dict`` instances. If a ``dict`` subtype is used for namespaces, the garbage collector can be unable to break some reference cycles. * Some functions have a fast-path for ``dict`` which would not be taken for ``dict`` subtypes, and so it would make Python a little bit slower. Prior Art ========= Method cache and type version tag --------------------------------- In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It was merged into Python 2.6. The patch adds a "type attribute cache version tag" (``tp_version_tag``) and a "valid version tag" flag to types (the ``PyTypeObject`` structure). The type version tag is not exposed at the Python level. The version tag has the C type ``unsigned int``. The cache is a global hash table of 4096 entries, shared by all types. The cache is global to "make it fast, have a deterministic and low memory footprint, and be easy to invalidate". Each cache entry has a version tag. A global version tag is used to create the next version tag, it also has the C type ``unsigned int``. By default, a type has its "valid version tag" flag cleared to indicate that the version tag is invalid. When the first method of the type is cached, the version tag and the "valid version tag" flag are set. When a type is modified, the "valid version tag" flag of the type and its subclasses is cleared. Later, when a cache entry of these types is used, the entry is removed because its version tag is outdated. On integer overflow, the whole cache is cleared and the global version tag is reset to ``0``. See `Method cache (issue #1685986) <https://bugs.python.org/issue1685986>`_ and `Armin's method cache optimization updated for Python 2.6 (issue #1700288) <https://bugs.python.org/issue1700288>`_. Globals / builtins cache ------------------------ In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue #10401) <http://bugs.python.org/issue10401>`_ which adds a private ``ma_version`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``Py_ssize_t``. The patch adds a "global and builtin cache" to functions and frames, and changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the cache. The change on the ``PyDictObject`` structure is very similar to this PEP. Cached globals+builtins lookup ------------------------------ In 2006, Andrea Griffini proposed a patch implementing a `Cached globals+builtins lookup optimization <https://bugs.python.org/issue1616125>`_. The patch adds a private ``timestamp`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``size_t``. Thread on python-dev: `About dictionary lookup caching <https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_ (December 2006). Guard against changing dict during iteration -------------------------------------------- In 2013, Serhiy Storchaka proposed `Guard against changing dict during iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``size_t``. This field is incremented when the dictionary is modified. PySizer ------- `PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python, Google Summer of Code 2005 project by Nick Smallbone. This project has a patch for CPython 2.4 which adds ``key_time`` and ``value_time`` fields to dictionary entries. It uses a global process-wide counter for dictionaries, incremented each time that a dictionary is modified. The times are used to decide when child objects first appeared in their parent objects. Discussion ========== Thread on the mailing lists: * python-dev: `Updated PEP 509 <https://mail.python.org/pipermail/python-dev/2016-April/144250.html>`_ * python-dev: `RFC: PEP 509: Add a private version to dict <https://mail.python.org/pipermail/python-dev/2016-April/144137.html>`_ * python-dev: `PEP 509: Add a private version to dict <https://mail.python.org/pipermail/python-dev/2016-January/142685.html>`_ (january 2016) * python-ideas: `RFC: PEP: Add dict.__version__ <https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_ (january 2016) Copyright ========= This document has been placed in the public domain.
![](https://secure.gravatar.com/avatar/daa45563a98419bb1b6b63904ce71f95.jpg?s=120&d=mm&r=g)
Hi,
My current implementation inserts the new ma_version_tag field in the middle of the PyDictObject structure, so it obviously changes the ABI. Can someone please confirm (double check) that the PyDictObject structure is explicitly excluded from the stable ABI? I'm talking about about the "#ifndef Py_LIMITED_API" in Include/dictobject.h. I understood what is an ABI in the hard way. When I ran the perf.py benchmark, I got a crash in ctypes on django_v3. The ctypes module uses a C type which inherits from the dict type. I compiled Python with and without my patch in the same directory and then I renamed the ./python binary, but the _ctypes.so was shared between the two binaries. Victor
![](https://secure.gravatar.com/avatar/daa45563a98419bb1b6b63904ce71f95.jpg?s=120&d=mm&r=g)
Hi, Guido van Rossum and Jim J. Jewett suggested me to *not require* to always increase the dict version if a dict method does not modify its content. I modified the Changes section to only require that the version is increased when the dictionary content is modified. I also explained the nice side effect of having an unique identifier for two empty dictionaries: it avoids a strong reference when checking if a namespace (dictionary) was replaced. Yury Selivanov's opcode cache uses this property to avoid a strong refence on builtin and global namespaces. It's also a written reply to Armin Rigo's suggestion to use the version 0 for empty dictionaries (new empty dict or for dict.clear()). The modified Changes section: Changes ======= Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with the C type ``PY_UINT64_T``, 64-bit unsigned integer. Add also a global dictionary version. Each time a dictionary is created, the global version is incremented and the dictionary version is initialized to the global version. Each time the dictionary content is modified, the global version must be incremented and copied to the dictionary version. Dictionary methods which can modify its content: * ``clear()`` * ``pop(key)`` * ``popitem()`` * ``setdefault(key, value)`` * ``__delitem__(key)`` * ``__setitem__(key, value)`` * ``update(...)`` The choice of increasing or not the version when a dictionary method does not change its content is left to the Python implementation. A Python implementation can decide to not increase the version to avoid dictionary lookups in guards. Examples of cases when dictionary methods don't modify its content: * ``clear()`` if the dict is already empty * ``pop(key)`` if the key does not exist * ``popitem()`` if the dict is empty * ``setdefault(key, value)`` if the key already exists * ``__delitem__(key)`` if the key does not exist * ``__setitem__(key, value)`` if the new value is identical to the current value * ``update()`` if called without argument or if new values are identical to current values Setting a key to a new value equals to the old value is also considered as an operation modifying the dictionary content. Two different empty dictionaries must have a different version to be able to identify a dictionary just by its version. It allows to verify in a guard that a namespace was not replaced without storing a strong reference to the dictionary. Using a borrowed reference does not work: if the old dictionary is destroyed, it is possible that a new dictionary is allocated at the same memory address. By the way, dictionaries don't support weak references. The version increase must be atomic. In CPython, the Global Interpreter Lock (GIL) already protects ``dict`` methods to make changes atomic. Example using an hypothetical ``dict_get_version(dict)`` function:: >>> d = {} >>> dict_get_version(d) 100 >>> d['key'] = 'value' >>> dict_get_version(d) 101 >>> d['key'] = 'new value' >>> dict_get_version(d) 102 >>> del d['key'] >>> dict_get_version(d) 103 The field is called ``ma_version_tag``, rather than ``ma_version``, to suggest to compare it using ``version_tag == old_version_tag``, rather than ``version <= old_version`` which becomes wrong after an integer overflow. -- Victor
![](https://secure.gravatar.com/avatar/daa45563a98419bb1b6b63904ce71f95.jpg?s=120&d=mm&r=g)
Hi,
My current implementation inserts the new ma_version_tag field in the middle of the PyDictObject structure, so it obviously changes the ABI. Can someone please confirm (double check) that the PyDictObject structure is explicitly excluded from the stable ABI? I'm talking about about the "#ifndef Py_LIMITED_API" in Include/dictobject.h. I understood what is an ABI in the hard way. When I ran the perf.py benchmark, I got a crash in ctypes on django_v3. The ctypes module uses a C type which inherits from the dict type. I compiled Python with and without my patch in the same directory and then I renamed the ./python binary, but the _ctypes.so was shared between the two binaries. Victor
![](https://secure.gravatar.com/avatar/daa45563a98419bb1b6b63904ce71f95.jpg?s=120&d=mm&r=g)
Hi, Guido van Rossum and Jim J. Jewett suggested me to *not require* to always increase the dict version if a dict method does not modify its content. I modified the Changes section to only require that the version is increased when the dictionary content is modified. I also explained the nice side effect of having an unique identifier for two empty dictionaries: it avoids a strong reference when checking if a namespace (dictionary) was replaced. Yury Selivanov's opcode cache uses this property to avoid a strong refence on builtin and global namespaces. It's also a written reply to Armin Rigo's suggestion to use the version 0 for empty dictionaries (new empty dict or for dict.clear()). The modified Changes section: Changes ======= Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with the C type ``PY_UINT64_T``, 64-bit unsigned integer. Add also a global dictionary version. Each time a dictionary is created, the global version is incremented and the dictionary version is initialized to the global version. Each time the dictionary content is modified, the global version must be incremented and copied to the dictionary version. Dictionary methods which can modify its content: * ``clear()`` * ``pop(key)`` * ``popitem()`` * ``setdefault(key, value)`` * ``__delitem__(key)`` * ``__setitem__(key, value)`` * ``update(...)`` The choice of increasing or not the version when a dictionary method does not change its content is left to the Python implementation. A Python implementation can decide to not increase the version to avoid dictionary lookups in guards. Examples of cases when dictionary methods don't modify its content: * ``clear()`` if the dict is already empty * ``pop(key)`` if the key does not exist * ``popitem()`` if the dict is empty * ``setdefault(key, value)`` if the key already exists * ``__delitem__(key)`` if the key does not exist * ``__setitem__(key, value)`` if the new value is identical to the current value * ``update()`` if called without argument or if new values are identical to current values Setting a key to a new value equals to the old value is also considered as an operation modifying the dictionary content. Two different empty dictionaries must have a different version to be able to identify a dictionary just by its version. It allows to verify in a guard that a namespace was not replaced without storing a strong reference to the dictionary. Using a borrowed reference does not work: if the old dictionary is destroyed, it is possible that a new dictionary is allocated at the same memory address. By the way, dictionaries don't support weak references. The version increase must be atomic. In CPython, the Global Interpreter Lock (GIL) already protects ``dict`` methods to make changes atomic. Example using an hypothetical ``dict_get_version(dict)`` function:: >>> d = {} >>> dict_get_version(d) 100 >>> d['key'] = 'value' >>> dict_get_version(d) 101 >>> d['key'] = 'new value' >>> dict_get_version(d) 102 >>> del d['key'] >>> dict_get_version(d) 103 The field is called ``ma_version_tag``, rather than ``ma_version``, to suggest to compare it using ``version_tag == old_version_tag``, rather than ``version <= old_version`` which becomes wrong after an integer overflow. -- Victor
participants (1)
-
Victor Stinner