[Python-Dev] RFC: PEP 509: Add a private version to dict

Victor Stinner victor.stinner at gmail.com
Thu Apr 14 11:19:42 EDT 2016


Hi,

I updated my PEP 509 to make the dictionary version globally unique.
With *two* use cases of this PEP (Yury's method call patch and my FAT
Python project), I think that the PEP is now ready to be accepted.

Globally unique identifier is a requirement for Yury's patch
optimizing method calls ( https://bugs.python.org/issue26110 ). It
allows to check for free if the dictionary was replaced.

I also renamed the ma_version field to ma_version_tag.

HTML version:
https://www.python.org/dev/peps/pep-0509/

Victor


PEP: 509
Title: Add a private version to dict
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner at gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 4-January-2016
Python-Version: 3.6


Abstract
========

Add a new private version to the builtin ``dict`` type, incremented at
each dictionary creation and at each dictionary change, to implement
fast guards on namespaces.


Rationale
=========

In Python, the builtin ``dict`` type is used by many instructions. For
example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
global namespace, or in the builtins namespace (two dict lookups).
Python uses ``dict`` for the builtins namespace, globals namespace, type
namespaces, instance namespaces, etc. The local namespace (namespace of
a function) is usually optimized to an array, but it can be a dict too.

Python is hard to optimize because almost everything is mutable: builtin
functions, function code, global variables, local variables, ... can be
modified at runtime. Implementing optimizations respecting the Python
semantics requires to detect when "something changes": we will call
these checks "guards".

The speedup of optimizations depends on the speed of guard checks. This
PEP proposes to add a version to dictionaries to implement fast guards
on namespaces.

Dictionary lookups can be skipped if the version does not change which
is the common case for most namespaces. Since the version is globally
unique, the version is also enough to check if the namespace dictionary
was not replaced with a new dictionary. The performance of a guard does
not depend on the number of watched dictionary entries, complexity of
O(1), if the dictionary version does not change.

Example of optimization: copy the value of a global variable to function
constants.  This optimization requires a guard on the global variable to
check if it was modified. If the variable is modified, the variable must
be loaded at runtime when the function is called, instead of using the
constant.

See the `PEP 510 -- Specialized functions with guards
<https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of
guards to specialize functions and for the rationale on Python static
optimizers.


Guard example
=============

Pseudo-code of an fast guard to check if a dictionary entry was modified
(created, updated or deleted) using an hypothetical
``dict_get_version(dict)`` function::

    UNSET = object()

    class GuardDictKey:
        def __init__(self, dict, key):
            self.dict = dict
            self.key = key
            self.value = dict.get(key, UNSET)
            self.version = dict_get_version(dict)

        def check(self):
            """Return True if the dictionary entry did not changed
            and the dictionary was not replaced."""

            # read the version of the dict structure
            version = dict_get_version(self.dict)
            if version == self.version:
                # Fast-path: dictionary lookup avoided
                return True

            # lookup in the dictionary
            value = self.dict.get(self.key, UNSET)
            if value is self.value:
                # another key was modified:
                # cache the new dictionary version
                self.version = version
                return True

            # the key was modified
            return False


Usage of the dict version
=========================

Speedup method calls 1.2x
-------------------------

Yury Selivanov wrote a `patch to optimize method calls
<https://bugs.python.org/issue26110>`_. The patch depends on the
`implement per-opcode cache in ceval
<https://bugs.python.org/issue26219>`_ patch which requires dictionary
versions to invalidate the cache if the globals dictionary or the
builtins dictionary has been modified.

The cache also requires that the dictionary version is globally unique.
It is possible to define a function in a namespace and call it
in a different namespace: using ``exec()`` with the *globals* parameter
for example. In this case, the globals dictionary was changed and the
cache must be invalidated.


Specialized functions using guards
----------------------------------

The `PEP 510 -- Specialized functions with guards
<https://www.python.org/dev/peps/pep-0510/>`_ proposes an API to support
specialized functions with guards. It allows to implement static
optimizers for Python without breaking the Python semantics.

Example of a static Python optimizer: the `fatoptimizer
<http://fatoptimizer.readthedocs.org/>`_ of the `FAT Python
<http://faster-cpython.readthedocs.org/fat_python.html>`_ project
implements many optimizations which require guards on namespaces.
Examples:

* Call pure builtins: to replace ``len("abc")`` with ``3``, guards on
  ``builtins.__dict__['len']`` and ``globals()['len']`` are required
* Loop unrolling: to unroll the loop ``for i in range(...): ...``,
  guards on ``builtins.__dict__['range']`` and ``globals()['range']``
  are required


Pyjion
------

According of Brett Cannon, one of the two main developers of Pyjion,
Pyjion can also benefit from dictionary version to implement
optimizations.

Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET
Core runtime).


Unladen Swallow
---------------

Even if dictionary version was not explicitly mentioned, optimizing
globals and builtins lookup was part of the Unladen Swallow plan:
"Implement one of the several proposed schemes for speeding lookups of
globals and builtins." Source: `Unladen Swallow ProjectPlan
<https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_.

Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler
implemented with LLVM. The project stopped in 2011: `Unladen Swallow
Retrospective
<http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html>`_.


Changes
=======

Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with
the C type ``PY_INT64_T``, 64-bit unsigned integer. Add also a global
dictionary version. Each time a dictionary is created, the global
version is incremented and the dictionary version is initialized to the
global version. The global version is also incremented and copied to the
dictionary version at each dictionary change:

* ``clear()`` if the dict was non-empty
* ``pop(key)`` if the key exists
* ``popitem()`` if the dict is non-empty
* ``setdefault(key, value)`` if the `key` does not exist
* ``__detitem__(key)`` if the key exists
* ``__setitem__(key, value)`` if the `key` doesn't exist or if the value
  is not ``dict[key]``
* ``update(...)`` if new values are different than existing values:
  values are compared by identity, not by their content; the version can
  be incremented multiple times

The ``PyDictObject`` structure is not part of the stable ABI.

The field is called ``ma_version_tag`` rather than ``ma_version`` to
suggest to compare it using ``version_tag == old_version_tag`` rather
than ``version <= old_version`` which makes the integer overflow much
likely.

Example using an hypothetical ``dict_get_version(dict)`` function::

    >>> d = {}
    >>> dict_get_version(d)
    100
    >>> d['key'] = 'value'
    >>> dict_get_version(d)
    101
    >>> d['key'] = 'new value'
    >>> dict_get_version(d)
    102
    >>> del d['key']
    >>> dict_get_version(d)
    103

The version is not incremented if an existing key is set to the same
value. For efficiency, values are compared by their identity:
``new_value is old_value``, not by their content:
``new_value == old_value``. Example::

    >>> d = {}
    >>> value = object()
    >>> d['key'] = value
    >>> dict_get_version(d)
    40
    >>> d['key'] = value
    >>> dict_get_version(d)
    40

.. note::
   CPython uses some singleton like integers in the range [-5; 257],
   empty tuple, empty strings, Unicode strings of a single character in
   the range [U+0000; U+00FF], etc. When a key is set twice to the same
   singleton, the version is not modified.


Implementation and Performance
==============================

The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject
<https://bugs.python.org/issue26058>`_ contains a patch implementing
this PEP.

On pybench and timeit microbenchmarks, the patch does not seem to add
any overhead on dictionary operations.

When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for
a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover,
a guard can watch for multiple keys. For example, for an optimization
using 10 global variables in a function, 10 dictionary lookups costs 148
ns, whereas the guard still only costs 3.8 ns when the version does not
change (39x as fast).

The `fat module
<http://fatoptimizer.readthedocs.org/en/latest/fat.html>`_ implements
such guards: ``fat.GuardDict`` is based on the dictionary version.


Integer overflow
================

The implementation uses the C type ``PY_UINT64_T`` to store the version:
a 64 bits unsigned integer. The C code uses ``version++``. On integer
overflow, the version is wrapped to ``0`` (and then continue to be
incremented) according to the C standard.

After an integer overflow, a guard can succeed whereas the watched
dictionary key was modified. The bug only occurs at a guard check if
there are exaclty ``2 ** 64`` dictionary creations or modifications
since the previous guard check.

If a dictionary is modified every nanosecond, ``2 ** 64`` modifications
takes longer than 584 years. Using a 32-bit version, it only takes 4
seconds. That's why a 64-bit unsigned type is also used on 32-bit
systems. A dictionary lookup at the C level takes 14.8 ns.

A risk of a bug every 584 years is acceptable.


Alternatives
============

Expose the version at Python level as a read-only __version__ property
----------------------------------------------------------------------

The first version of the PEP proposed to expose the dictionary version
as a read-only ``__version__`` property at Python level, and also to add
the property to ``collections.UserDict`` (since this type must mimick
the ``dict`` API).

There are multiple issues:

* To be consistent and avoid bad surprises, the version must be added to
  all mapping types. Implementing a new mapping type would require extra
  work for no benefit, since the version is only required on the
  ``dict`` type in practice.
* All Python implementations must implement this new property, it gives
  more work to other implementations, whereas they may not use the
  dictionary version at all.
* Exposing the dictionary version at Python level can lead the
  false assumption on performances. Checking ``dict.__version__`` at
  the Python level is not faster than a dictionary lookup. A dictionary
  lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5
  ns, the difference is only 1.2 ns (3%)::


    $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33'
    10000000 loops, best of 3: 0.0487 usec per loop
    $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}'
'd.__version__ == 100'
    10000000 loops, best of 3: 0.0475 usec per loop

* The ``__version__`` can be wrapped on integer overflow. It is error
  prone: using ``dict.__version__ <= guard_version`` is wrong,
  ``dict.__version__ == guard_version`` must be used instead to reduce
  the risk of bug on integer overflow (even if the integer overflow is
  unlikely in practice).

Mandatory bikeshedding on the property name:

* ``__cache_token__``: name proposed by Nick Coghlan, name coming from
  `abc.get_cache_token()
  <https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_.
* ``__version__``
* ``__timestamp__``


Add a version to each dict entry
--------------------------------

A single version per dictionary requires to keep a strong reference to
the value which can keep the value alive longer than expected. If we add
also a version per dictionary entry, the guard can only store the entry
version to avoid the strong reference to the value (only strong
references to the dictionary and to the key are needed).

Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure,
the field has the C type ``PY_INT64_T``. When a key is created or
modified, the entry version is set to the dictionary version which is
incremented at any change (create, modify, delete).

Pseudo-code of an fast guard to check if a dictionary key was modified
using hypothetical ``dict_get_version(dict)``
``dict_get_entry_version(dict)`` functions::

    UNSET = object()

    class GuardDictKey:
        def __init__(self, dict, key):
            self.dict = dict
            self.key = key
            self.dict_version = dict_get_version(dict)
            self.entry_version = dict_get_entry_version(dict, key)

        def check(self):
            """Return True if the dictionary entry did not changed
            and the dictionary was not replaced."""

            # read the version of the dict structure
            dict_version = dict_get_version(self.dict)
            if dict_version == self.version:
                # Fast-path: dictionary lookup avoided
                return True

            # lookup in the dictionary
            entry_version = get_dict_key_version(dict, key)
            if entry_version == self.entry_version:
                # another key was modified:
                # cache the new dictionary version
                self.dict_version = dict_version
                return True

            # the key was modified
            return False

The main drawback of this option is the impact on the memory footprint.
It increases the size of each dictionary entry, so the overhead depends
on the number of buckets (dictionary entries, used or unused yet). For
example, it increases the size of each dictionary entry by 8 bytes on
64-bit system.

In Python, the memory footprint matters and the trend is to reduce it.
Examples:

* `PEP 393 -- Flexible String Representation
  <https://www.python.org/dev/peps/pep-0393/>`_
* `PEP 412 -- Key-Sharing Dictionary
  <https://www.python.org/dev/peps/pep-0412/>`_


Add a new dict subtype
----------------------

Add a new ``verdict`` type, subtype of ``dict``. When guards are needed,
use the ``verdict`` for namespaces (module namespace, type namespace,
instance namespace, etc.) instead of ``dict``.

Leave the ``dict`` type unchanged to not add any overhead (memory
footprint) when guards are not needed.

Technical issue: a lot of C code in the wild, including CPython core,
expecting the exact ``dict`` type. Issues:

* ``exec()`` requires a ``dict`` for globals and locals. A lot of code
  use ``globals={}``. It is not possible to cast the ``dict`` to a
  ``dict`` subtype because the caller expects the ``globals`` parameter
  to be modified (``dict`` is mutable).
* Functions call directly ``PyDict_xxx()`` functions, instead of calling
  ``PyObject_xxx()`` if the object is a ``dict`` subtype
* ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some
  functions require the exact ``dict`` type.
* ``Python/ceval.c`` does not completely supports dict subtypes for
  namespaces


The ``exec()`` issue is a blocker issue.

Other issues:

* The garbage collector has a special code to "untrack" ``dict``
  instances. If a ``dict`` subtype is used for namespaces, the garbage
  collector can be unable to break some reference cycles.
* Some functions have a fast-path for ``dict`` which would not be taken
  for ``dict`` subtypes, and so it would make Python a little bit
  slower.


Prior Art
=========

Method cache and type version tag
---------------------------------

In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It
was merged into Python 2.6.  The patch adds a "type attribute cache
version tag" (``tp_version_tag``) and a "valid version tag" flag to
types (the ``PyTypeObject`` structure).

The type version tag is not available at the Python level.

The version tag has the C type ``unsigned int``. The cache is a global
hash table of 4096 entries, shared by all types. The cache is global to
"make it fast, have a deterministic and low memory footprint, and be
easy to invalidate". Each cache entry has a version tag. A global
version tag is used to create the next version tag, it also has the C
type ``unsigned int``.

By default, a type has its "valid version tag" flag cleared to indicate
that the version tag is invalid. When the first method of the type is
cached, the version tag and the "valid version tag" flag are set. When a
type is modified, the "valid version tag" flag of the type and its
subclasses is cleared. Later, when a cache entry of these types is used,
the entry is removed because its version tag is outdated.

On integer overflow, the whole cache is cleared and the global version
tag is reset to ``0``.

See `Method cache (issue #1685986)
<https://bugs.python.org/issue1685986>`_ and `Armin's method cache
optimization updated for Python 2.6 (issue #1700288)
<https://bugs.python.org/issue1700288>`_.


Globals / builtins cache
------------------------

In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue
#10401) <http://bugs.python.org/issue10401>`_ which adds a private
``ma_version`` field to the ``PyDictObject`` structure (``dict`` type),
the field has the C type ``Py_ssize_t``.

The patch adds a "global and builtin cache" to functions and frames, and
changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the
cache.

The change on the ``PyDictObject`` structure is very similar to this
PEP.


Cached globals+builtins lookup
------------------------------

In 2006, Andrea Griffini proposed a patch implementing a `Cached
globals+builtins lookup optimization
<https://bugs.python.org/issue1616125>`_.  The patch adds a private
``timestamp`` field to the ``PyDictObject`` structure (``dict`` type),
the field has the C type ``size_t``.

Thread on python-dev: `About dictionary lookup caching
<https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_.


Guard against changing dict during iteration
--------------------------------------------

In 2013, Serhiy Storchaka proposed `Guard against changing dict during
iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which
adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict``
type), the field has the C type ``size_t``.  This field is incremented
when the dictionary is modified, and so is very similar to the proposed
dictionary version.

Sadly, the dictionary version proposed in this PEP doesn't help to
detect dictionary mutation. The dictionary version changes when values
are replaced, whereas modifying dictionary values while iterating on
dictionary keys is legit in Python.


PySizer
-------

`PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python,
Google Summer of Code 2005 project by Nick Smallbone.

This project has a patch for CPython 2.4 which adds ``key_time`` and
``value_time`` fields to dictionary entries. It uses a global
process-wide counter for dictionaries, incremented each time that a
dictionary is modified. The times are used to decide when child objects
first appeared in their parent objects.


Discussion
==========

Thread on the mailing lists:

* python-dev: `PEP 509: Add a private version to dict
  <https://mail.python.org/pipermail/python-dev/2016-January/142685.html>`_
  (january 2016)
* python-ideas: `RFC: PEP: Add dict.__version__
  <https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_
  (january 2016)


Copyright
=========

This document has been placed in the public domain.


More information about the Python-Dev mailing list