[Python-Dev] RFC: PEP 509: Add a private version to dict

Guido van Rossum guido at python.org
Thu Apr 14 12:59:50 EDT 2016


I'll wait a day before formally pronouncing to see if any objections
are made, but it looks good to me.

On Thu, Apr 14, 2016 at 8:19 AM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> Hi,
>
> I updated my PEP 509 to make the dictionary version globally unique.
> With *two* use cases of this PEP (Yury's method call patch and my FAT
> Python project), I think that the PEP is now ready to be accepted.
>
> Globally unique identifier is a requirement for Yury's patch
> optimizing method calls ( https://bugs.python.org/issue26110 ). It
> allows to check for free if the dictionary was replaced.
>
> I also renamed the ma_version field to ma_version_tag.
>
> HTML version:
> https://www.python.org/dev/peps/pep-0509/
>
> Victor
>
>
> PEP: 509
> Title: Add a private version to dict
> Version: $Revision$
> Last-Modified: $Date$
> Author: Victor Stinner <victor.stinner at gmail.com>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 4-January-2016
> Python-Version: 3.6
>
>
> Abstract
> ========
>
> Add a new private version to the builtin ``dict`` type, incremented at
> each dictionary creation and at each dictionary change, to implement
> fast guards on namespaces.
>
>
> Rationale
> =========
>
> In Python, the builtin ``dict`` type is used by many instructions. For
> example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
> global namespace, or in the builtins namespace (two dict lookups).
> Python uses ``dict`` for the builtins namespace, globals namespace, type
> namespaces, instance namespaces, etc. The local namespace (namespace of
> a function) is usually optimized to an array, but it can be a dict too.
>
> Python is hard to optimize because almost everything is mutable: builtin
> functions, function code, global variables, local variables, ... can be
> modified at runtime. Implementing optimizations respecting the Python
> semantics requires to detect when "something changes": we will call
> these checks "guards".
>
> The speedup of optimizations depends on the speed of guard checks. This
> PEP proposes to add a version to dictionaries to implement fast guards
> on namespaces.
>
> Dictionary lookups can be skipped if the version does not change which
> is the common case for most namespaces. Since the version is globally
> unique, the version is also enough to check if the namespace dictionary
> was not replaced with a new dictionary. The performance of a guard does
> not depend on the number of watched dictionary entries, complexity of
> O(1), if the dictionary version does not change.
>
> Example of optimization: copy the value of a global variable to function
> constants.  This optimization requires a guard on the global variable to
> check if it was modified. If the variable is modified, the variable must
> be loaded at runtime when the function is called, instead of using the
> constant.
>
> See the `PEP 510 -- Specialized functions with guards
> <https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of
> guards to specialize functions and for the rationale on Python static
> optimizers.
>
>
> Guard example
> =============
>
> Pseudo-code of an fast guard to check if a dictionary entry was modified
> (created, updated or deleted) using an hypothetical
> ``dict_get_version(dict)`` function::
>
>     UNSET = object()
>
>     class GuardDictKey:
>         def __init__(self, dict, key):
>             self.dict = dict
>             self.key = key
>             self.value = dict.get(key, UNSET)
>             self.version = dict_get_version(dict)
>
>         def check(self):
>             """Return True if the dictionary entry did not changed
>             and the dictionary was not replaced."""
>
>             # read the version of the dict structure
>             version = dict_get_version(self.dict)
>             if version == self.version:
>                 # Fast-path: dictionary lookup avoided
>                 return True
>
>             # lookup in the dictionary
>             value = self.dict.get(self.key, UNSET)
>             if value is self.value:
>                 # another key was modified:
>                 # cache the new dictionary version
>                 self.version = version
>                 return True
>
>             # the key was modified
>             return False
>
>
> Usage of the dict version
> =========================
>
> Speedup method calls 1.2x
> -------------------------
>
> Yury Selivanov wrote a `patch to optimize method calls
> <https://bugs.python.org/issue26110>`_. The patch depends on the
> `implement per-opcode cache in ceval
> <https://bugs.python.org/issue26219>`_ patch which requires dictionary
> versions to invalidate the cache if the globals dictionary or the
> builtins dictionary has been modified.
>
> The cache also requires that the dictionary version is globally unique.
> It is possible to define a function in a namespace and call it
> in a different namespace: using ``exec()`` with the *globals* parameter
> for example. In this case, the globals dictionary was changed and the
> cache must be invalidated.
>
>
> Specialized functions using guards
> ----------------------------------
>
> The `PEP 510 -- Specialized functions with guards
> <https://www.python.org/dev/peps/pep-0510/>`_ proposes an API to support
> specialized functions with guards. It allows to implement static
> optimizers for Python without breaking the Python semantics.
>
> Example of a static Python optimizer: the `fatoptimizer
> <http://fatoptimizer.readthedocs.org/>`_ of the `FAT Python
> <http://faster-cpython.readthedocs.org/fat_python.html>`_ project
> implements many optimizations which require guards on namespaces.
> Examples:
>
> * Call pure builtins: to replace ``len("abc")`` with ``3``, guards on
>   ``builtins.__dict__['len']`` and ``globals()['len']`` are required
> * Loop unrolling: to unroll the loop ``for i in range(...): ...``,
>   guards on ``builtins.__dict__['range']`` and ``globals()['range']``
>   are required
>
>
> Pyjion
> ------
>
> According of Brett Cannon, one of the two main developers of Pyjion,
> Pyjion can also benefit from dictionary version to implement
> optimizations.
>
> Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET
> Core runtime).
>
>
> Unladen Swallow
> ---------------
>
> Even if dictionary version was not explicitly mentioned, optimizing
> globals and builtins lookup was part of the Unladen Swallow plan:
> "Implement one of the several proposed schemes for speeding lookups of
> globals and builtins." Source: `Unladen Swallow ProjectPlan
> <https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_.
>
> Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler
> implemented with LLVM. The project stopped in 2011: `Unladen Swallow
> Retrospective
> <http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html>`_.
>
>
> Changes
> =======
>
> Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with
> the C type ``PY_INT64_T``, 64-bit unsigned integer. Add also a global
> dictionary version. Each time a dictionary is created, the global
> version is incremented and the dictionary version is initialized to the
> global version. The global version is also incremented and copied to the
> dictionary version at each dictionary change:
>
> * ``clear()`` if the dict was non-empty
> * ``pop(key)`` if the key exists
> * ``popitem()`` if the dict is non-empty
> * ``setdefault(key, value)`` if the `key` does not exist
> * ``__detitem__(key)`` if the key exists
> * ``__setitem__(key, value)`` if the `key` doesn't exist or if the value
>   is not ``dict[key]``
> * ``update(...)`` if new values are different than existing values:
>   values are compared by identity, not by their content; the version can
>   be incremented multiple times
>
> The ``PyDictObject`` structure is not part of the stable ABI.
>
> The field is called ``ma_version_tag`` rather than ``ma_version`` to
> suggest to compare it using ``version_tag == old_version_tag`` rather
> than ``version <= old_version`` which makes the integer overflow much
> likely.
>
> Example using an hypothetical ``dict_get_version(dict)`` function::
>
>     >>> d = {}
>     >>> dict_get_version(d)
>     100
>     >>> d['key'] = 'value'
>     >>> dict_get_version(d)
>     101
>     >>> d['key'] = 'new value'
>     >>> dict_get_version(d)
>     102
>     >>> del d['key']
>     >>> dict_get_version(d)
>     103
>
> The version is not incremented if an existing key is set to the same
> value. For efficiency, values are compared by their identity:
> ``new_value is old_value``, not by their content:
> ``new_value == old_value``. Example::
>
>     >>> d = {}
>     >>> value = object()
>     >>> d['key'] = value
>     >>> dict_get_version(d)
>     40
>     >>> d['key'] = value
>     >>> dict_get_version(d)
>     40
>
> .. note::
>    CPython uses some singleton like integers in the range [-5; 257],
>    empty tuple, empty strings, Unicode strings of a single character in
>    the range [U+0000; U+00FF], etc. When a key is set twice to the same
>    singleton, the version is not modified.
>
>
> Implementation and Performance
> ==============================
>
> The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject
> <https://bugs.python.org/issue26058>`_ contains a patch implementing
> this PEP.
>
> On pybench and timeit microbenchmarks, the patch does not seem to add
> any overhead on dictionary operations.
>
> When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for
> a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover,
> a guard can watch for multiple keys. For example, for an optimization
> using 10 global variables in a function, 10 dictionary lookups costs 148
> ns, whereas the guard still only costs 3.8 ns when the version does not
> change (39x as fast).
>
> The `fat module
> <http://fatoptimizer.readthedocs.org/en/latest/fat.html>`_ implements
> such guards: ``fat.GuardDict`` is based on the dictionary version.
>
>
> Integer overflow
> ================
>
> The implementation uses the C type ``PY_UINT64_T`` to store the version:
> a 64 bits unsigned integer. The C code uses ``version++``. On integer
> overflow, the version is wrapped to ``0`` (and then continue to be
> incremented) according to the C standard.
>
> After an integer overflow, a guard can succeed whereas the watched
> dictionary key was modified. The bug only occurs at a guard check if
> there are exaclty ``2 ** 64`` dictionary creations or modifications
> since the previous guard check.
>
> If a dictionary is modified every nanosecond, ``2 ** 64`` modifications
> takes longer than 584 years. Using a 32-bit version, it only takes 4
> seconds. That's why a 64-bit unsigned type is also used on 32-bit
> systems. A dictionary lookup at the C level takes 14.8 ns.
>
> A risk of a bug every 584 years is acceptable.
>
>
> Alternatives
> ============
>
> Expose the version at Python level as a read-only __version__ property
> ----------------------------------------------------------------------
>
> The first version of the PEP proposed to expose the dictionary version
> as a read-only ``__version__`` property at Python level, and also to add
> the property to ``collections.UserDict`` (since this type must mimick
> the ``dict`` API).
>
> There are multiple issues:
>
> * To be consistent and avoid bad surprises, the version must be added to
>   all mapping types. Implementing a new mapping type would require extra
>   work for no benefit, since the version is only required on the
>   ``dict`` type in practice.
> * All Python implementations must implement this new property, it gives
>   more work to other implementations, whereas they may not use the
>   dictionary version at all.
> * Exposing the dictionary version at Python level can lead the
>   false assumption on performances. Checking ``dict.__version__`` at
>   the Python level is not faster than a dictionary lookup. A dictionary
>   lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5
>   ns, the difference is only 1.2 ns (3%)::
>
>
>     $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33'
>     10000000 loops, best of 3: 0.0487 usec per loop
>     $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}'
> 'd.__version__ == 100'
>     10000000 loops, best of 3: 0.0475 usec per loop
>
> * The ``__version__`` can be wrapped on integer overflow. It is error
>   prone: using ``dict.__version__ <= guard_version`` is wrong,
>   ``dict.__version__ == guard_version`` must be used instead to reduce
>   the risk of bug on integer overflow (even if the integer overflow is
>   unlikely in practice).
>
> Mandatory bikeshedding on the property name:
>
> * ``__cache_token__``: name proposed by Nick Coghlan, name coming from
>   `abc.get_cache_token()
>   <https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_.
> * ``__version__``
> * ``__timestamp__``
>
>
> Add a version to each dict entry
> --------------------------------
>
> A single version per dictionary requires to keep a strong reference to
> the value which can keep the value alive longer than expected. If we add
> also a version per dictionary entry, the guard can only store the entry
> version to avoid the strong reference to the value (only strong
> references to the dictionary and to the key are needed).
>
> Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure,
> the field has the C type ``PY_INT64_T``. When a key is created or
> modified, the entry version is set to the dictionary version which is
> incremented at any change (create, modify, delete).
>
> Pseudo-code of an fast guard to check if a dictionary key was modified
> using hypothetical ``dict_get_version(dict)``
> ``dict_get_entry_version(dict)`` functions::
>
>     UNSET = object()
>
>     class GuardDictKey:
>         def __init__(self, dict, key):
>             self.dict = dict
>             self.key = key
>             self.dict_version = dict_get_version(dict)
>             self.entry_version = dict_get_entry_version(dict, key)
>
>         def check(self):
>             """Return True if the dictionary entry did not changed
>             and the dictionary was not replaced."""
>
>             # read the version of the dict structure
>             dict_version = dict_get_version(self.dict)
>             if dict_version == self.version:
>                 # Fast-path: dictionary lookup avoided
>                 return True
>
>             # lookup in the dictionary
>             entry_version = get_dict_key_version(dict, key)
>             if entry_version == self.entry_version:
>                 # another key was modified:
>                 # cache the new dictionary version
>                 self.dict_version = dict_version
>                 return True
>
>             # the key was modified
>             return False
>
> The main drawback of this option is the impact on the memory footprint.
> It increases the size of each dictionary entry, so the overhead depends
> on the number of buckets (dictionary entries, used or unused yet). For
> example, it increases the size of each dictionary entry by 8 bytes on
> 64-bit system.
>
> In Python, the memory footprint matters and the trend is to reduce it.
> Examples:
>
> * `PEP 393 -- Flexible String Representation
>   <https://www.python.org/dev/peps/pep-0393/>`_
> * `PEP 412 -- Key-Sharing Dictionary
>   <https://www.python.org/dev/peps/pep-0412/>`_
>
>
> Add a new dict subtype
> ----------------------
>
> Add a new ``verdict`` type, subtype of ``dict``. When guards are needed,
> use the ``verdict`` for namespaces (module namespace, type namespace,
> instance namespace, etc.) instead of ``dict``.
>
> Leave the ``dict`` type unchanged to not add any overhead (memory
> footprint) when guards are not needed.
>
> Technical issue: a lot of C code in the wild, including CPython core,
> expecting the exact ``dict`` type. Issues:
>
> * ``exec()`` requires a ``dict`` for globals and locals. A lot of code
>   use ``globals={}``. It is not possible to cast the ``dict`` to a
>   ``dict`` subtype because the caller expects the ``globals`` parameter
>   to be modified (``dict`` is mutable).
> * Functions call directly ``PyDict_xxx()`` functions, instead of calling
>   ``PyObject_xxx()`` if the object is a ``dict`` subtype
> * ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some
>   functions require the exact ``dict`` type.
> * ``Python/ceval.c`` does not completely supports dict subtypes for
>   namespaces
>
>
> The ``exec()`` issue is a blocker issue.
>
> Other issues:
>
> * The garbage collector has a special code to "untrack" ``dict``
>   instances. If a ``dict`` subtype is used for namespaces, the garbage
>   collector can be unable to break some reference cycles.
> * Some functions have a fast-path for ``dict`` which would not be taken
>   for ``dict`` subtypes, and so it would make Python a little bit
>   slower.
>
>
> Prior Art
> =========
>
> Method cache and type version tag
> ---------------------------------
>
> In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It
> was merged into Python 2.6.  The patch adds a "type attribute cache
> version tag" (``tp_version_tag``) and a "valid version tag" flag to
> types (the ``PyTypeObject`` structure).
>
> The type version tag is not available at the Python level.
>
> The version tag has the C type ``unsigned int``. The cache is a global
> hash table of 4096 entries, shared by all types. The cache is global to
> "make it fast, have a deterministic and low memory footprint, and be
> easy to invalidate". Each cache entry has a version tag. A global
> version tag is used to create the next version tag, it also has the C
> type ``unsigned int``.
>
> By default, a type has its "valid version tag" flag cleared to indicate
> that the version tag is invalid. When the first method of the type is
> cached, the version tag and the "valid version tag" flag are set. When a
> type is modified, the "valid version tag" flag of the type and its
> subclasses is cleared. Later, when a cache entry of these types is used,
> the entry is removed because its version tag is outdated.
>
> On integer overflow, the whole cache is cleared and the global version
> tag is reset to ``0``.
>
> See `Method cache (issue #1685986)
> <https://bugs.python.org/issue1685986>`_ and `Armin's method cache
> optimization updated for Python 2.6 (issue #1700288)
> <https://bugs.python.org/issue1700288>`_.
>
>
> Globals / builtins cache
> ------------------------
>
> In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue
> #10401) <http://bugs.python.org/issue10401>`_ which adds a private
> ``ma_version`` field to the ``PyDictObject`` structure (``dict`` type),
> the field has the C type ``Py_ssize_t``.
>
> The patch adds a "global and builtin cache" to functions and frames, and
> changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the
> cache.
>
> The change on the ``PyDictObject`` structure is very similar to this
> PEP.
>
>
> Cached globals+builtins lookup
> ------------------------------
>
> In 2006, Andrea Griffini proposed a patch implementing a `Cached
> globals+builtins lookup optimization
> <https://bugs.python.org/issue1616125>`_.  The patch adds a private
> ``timestamp`` field to the ``PyDictObject`` structure (``dict`` type),
> the field has the C type ``size_t``.
>
> Thread on python-dev: `About dictionary lookup caching
> <https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_.
>
>
> Guard against changing dict during iteration
> --------------------------------------------
>
> In 2013, Serhiy Storchaka proposed `Guard against changing dict during
> iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which
> adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict``
> type), the field has the C type ``size_t``.  This field is incremented
> when the dictionary is modified, and so is very similar to the proposed
> dictionary version.
>
> Sadly, the dictionary version proposed in this PEP doesn't help to
> detect dictionary mutation. The dictionary version changes when values
> are replaced, whereas modifying dictionary values while iterating on
> dictionary keys is legit in Python.
>
>
> PySizer
> -------
>
> `PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python,
> Google Summer of Code 2005 project by Nick Smallbone.
>
> This project has a patch for CPython 2.4 which adds ``key_time`` and
> ``value_time`` fields to dictionary entries. It uses a global
> process-wide counter for dictionaries, incremented each time that a
> dictionary is modified. The times are used to decide when child objects
> first appeared in their parent objects.
>
>
> Discussion
> ==========
>
> Thread on the mailing lists:
>
> * python-dev: `PEP 509: Add a private version to dict
>   <https://mail.python.org/pipermail/python-dev/2016-January/142685.html>`_
>   (january 2016)
> * python-ideas: `RFC: PEP: Add dict.__version__
>   <https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_
>   (january 2016)
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)


More information about the Python-Dev mailing list