PEP 509: Add a private version to dict
Hi, After a first round on python-ideas, here is the second version of my PEP. The main changes since the first version are that the dictionary version is no more exposed at the Python level and the field type now also has a size of 64-bit on 32-bit platforms. The PEP is part of a serie of 3 PEP adding an API to implement a static Python optimizer specializing functions with guards. The second PEP is currently discussed on python-ideas and I'm still working on the third PEP. Thanks to Red Hat for giving me time to experiment on this. HTML version: https://www.python.org/dev/peps/pep-0509/ PEP: 509 Title: Add a private version to dict Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner <victor.stinner@gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 4-January-2016 Python-Version: 3.6 Abstract ======== Add a new private version to builtin ``dict`` type, incremented at each change, to implement fast guards on namespaces. Rationale ========= In Python, the builtin ``dict`` type is used by many instructions. For example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the global namespace, or in the builtins namespace (two dict lookups). Python uses ``dict`` for the builtins namespace, globals namespace, type namespaces, instance namespaces, etc. The local namespace (namespace of a function) is usually optimized to an array, but it can be a dict too. Python is hard to optimize because almost everything is mutable: builtin functions, function code, global variables, local variables, ... can be modified at runtime. Implementing optimizations respecting the Python semantics requires to detect when "something changes": we will call these checks "guards". The speedup of optimizations depends on the speed of guard checks. This PEP proposes to add a version to dictionaries to implement fast guards on namespaces. Dictionary lookups can be skipped if the version does not change which is the common case for most namespaces. The performance of a guard does not depend on the number of watched dictionary entries, complexity of O(1), if the dictionary version does not change. Example of optimization: copy the value of a global variable to function constants. This optimization requires a guard on the global variable to check if it was modified. If the variable is modified, the variable must be loaded at runtime when the function is called, instead of using the constant. See the `PEP 510 -- Specialized functions with guards <https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of guards to specialize functions and for the rationale on Python static optimizers. Guard example ============= Pseudo-code of an fast guard to check if a dictionary entry was modified (created, updated or deleted) using an hypothetical ``dict_get_version(dict)`` function:: UNSET = object() class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.value = dict.get(key, UNSET) self.version = dict_get_version(dict) def check(self): """Return True if the dictionary entry did not changed.""" # read the version field of the dict structure version = dict_get_version(self.dict) if version == self.version: # Fast-path: dictionary lookup avoided return True # lookup in the dictionary value = self.dict.get(self.key, UNSET) if value is self.value: # another key was modified: # cache the new dictionary version self.version = version return True # the key was modified return False Usage of the dict version ========================= Specialized functions using guards ---------------------------------- The `PEP 510 -- Specialized functions with guards <https://www.python.org/dev/peps/pep-0510/>`_ proposes an API to support specialized functions with guards. It allows to implement static optimizers for Python without breaking the Python semantics. Example of a static Python optimizer: the astoptimizer of the `FAT Python <http://faster-cpython.readthedocs.org/fat_python.html>`_ project implements many optimizations which require guards on namespaces. Examples: * Call pure builtins: to replace ``len("abc")`` with ``3``, guards on ``builtins.__dict__['len']`` and ``globals()['len']`` are required * Loop unrolling: to unroll the loop ``for i in range(...): ...``, guards on ``builtins.__dict__['range']`` and ``globals()['range']`` are required Pyjion ------ According of Brett Cannon, one of the two main developers of Pyjion, Pyjion can also benefit from dictionary version to implement optimizations. Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET Core runtime). Unladen Swallow --------------- Even if dictionary version was not explicitly mentionned, optimization globals and builtins lookup was part of the Unladen Swallow plan: "Implement one of the several proposed schemes for speeding lookups of globals and builtins." Source: `Unladen Swallow ProjectPlan <https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_. Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler implemented with LLVM. The project stopped in 2011: `Unladen Swallow Retrospective <http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html>`_. Changes ======= Add a ``ma_version`` field to the ``PyDictObject`` structure with the C type ``PY_INT64_T``, 64-bit unsigned integer. New empty dictionaries are initilized to version ``0``. The version is incremented at each change: * ``clear()`` if the dict was non-empty * ``pop(key)`` if the key exists * ``popitem()`` if the dict is non-empty * ``setdefault(key, value)`` if the `key` does not exist * ``__detitem__(key)`` if the key exists * ``__setitem__(key, value)`` if the `key` doesn't exist or if the value is different * ``update(...)`` if new values are different than existing values (the version can be incremented multiple times) Example using an hypothetical ``dict_get_version(dict)`` function:: >>> d = {} >>> dict_get_version(d) 0 >>> d['key'] = 'value' >>> dict_get_version(d) 1 >>> d['key'] = 'new value' >>> dict_get_version(d) 2 >>> del d['key'] >>> dict_get_version(d) 3 If a dictionary is created with items, the version is also incremented at each dictionary insertion. Example:: >>> d=dict(x=7, y=33) >>> dict_get_version(d) 2 The version is not incremented if an existing key is set to the same value. For efficiency, values are compared by their identity: ``new_value is old_value``, not by their content: ``new_value == old_value``. Example:: >>> d={} >>> value = object() >>> d['key'] = value >>> dict_get_version(d) 2 >>> d['key'] = value >>> dict_get_version(d) 2 .. note:: CPython uses some singleton like integers in the range [-5; 257], empty tuple, empty strings, Unicode strings of a single character in the range [U+0000; U+00FF], etc. When a key is set twice to the same singleton, the version is not modified. Implementation ============== The `issue #26058: PEP 509: Add ma_version to PyDictObject <https://bugs.python.org/issue26058>`_ contains a patch implementing this PEP. On pybench and timeit microbenchmarks, the patch does not seem to add any overhead on dictionary operations. When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for a dictioanry lookup, whereas a guard check only takes 3.8 ns. Moreover, a guard can watch for multiple keys. For example, for an optimization using 10 global variables in a function, 10 dictionary lookups costs 148 ns, whereas the guard still only costs 3.8 ns when the version does not change (39x as fast). Integer overflow ================ The implementation uses the C unsigned integer type ``PY_INT64_T`` to store the version, a 64 bits unsigned integer. The C code uses ``version++``. On integer overflow, the version is wrapped to ``0`` (and then continue to be incremented) according to the C standard. After an integer overflow, a guard can succeed whereas the watched dictionary key was modified. The bug occurs if the dictionary is modified at least ``2 ** 64`` times between two checks of the guard and if the new version (theorical value with no integer overflow) is equal to the old version modulo ``2 ** 64``. If a dictionary is modified each nanosecond, an overflow takes longer than 584 years. Using a 32-bit version, the overflow occurs only after 4 seconds. That's why a 64-bit unsigned type is also used on 32-bit systems. A dictionary lookup at the C level takes 14.8 ns. A risk of a bug every 584 years is acceptable. Alternatives ============ Expose the version at Python level as a read-only __version__ property ---------------------------------------------------------------------- The first version of the PEP proposed to expose the dictionary version as a read-only ``__version__`` property at Python level, and also to add the property to ``collections.UserDict`` (since this type must mimick the ``dict`` API). There are multiple issues: * To be consistent and avoid bad surprises, the version must be added to all mapping types. Implementing a new mapping type would require extra work for no benefit, since the version is only required on the ``dict`` type in practice. * All Python implementations must implement this new property, it gives more work to other implementations, whereas they may not use the dictionary version at all. * The ``__version__`` can be wrapped on integer overflow. It is error prone: using ``dict.__version__ <= guard_version`` is wrong, ``dict.__version__ == guard_version`` must be used instead to reduce the risk of bug on integer overflow (even if the integer overflow is unlikely in practice). * Exposing the dictionary version at Python level can lead the false assumption on performances. Checking ``dict.__version__`` at the Python level is not faster than a dictionary lookup. A dictionary lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5 ns, the difference is only 1.2 ns (3%):: $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33' 10000000 loops, best of 3: 0.0487 usec per loop $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100' 10000000 loops, best of 3: 0.0475 usec per loop Bikeshedding on the property name: * ``__cache_token__``: name proposed by Nick Coghlan, name coming from `abc.get_cache_token() <https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_. * ``__version__`` * ``__timestamp__`` Add a version to each dict entry -------------------------------- A single version per dictionary requires to keep a strong reference to the value which can keep the value alive longer than expected. If we add also a version per dictionary entry, the guard can only store the entry version to avoid the strong reference to the value (only strong references to the dictionary and to the key are needed). Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure, the field has the C type ``PY_INT64_T``. When a key is created or modified, the entry version is set to the dictionary version which is incremented at any change (create, modify, delete). Pseudo-code of an fast guard to check if a dictionary key was modified using hypothetical ``dict_get_version(dict)`` ``dict_get_entry_version(dict)`` functions:: UNSET = object() class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.dict_version = dict_get_version(dict) self.entry_version = dict_get_entry_version(dict, key) def check(self): """Return True if the dictionary entry did not changed.""" # read the version field of the dict structure dict_version = dict_get_version(self.dict) if dict_version == self.version: # Fast-path: dictionary lookup avoided return True # lookup in the dictionary entry_version = get_dict_key_version(dict, key) if entry_version == self.entry_version: # another key was modified: # cache the new dictionary version self.dict_version = dict_version return True # the key was modified return False The main drawback of this option is the impact on the memory footprint. It increases the size of each dictionary entry, so the overhead depends on the number of buckets (dictionary entries, used or unused yet). For example, it increases the size of each dictionary entry by 8 bytes on 64-bit system. In Python, the memory footprint matters and the trend is to reduce it. Examples: * `PEP 393 -- Flexible String Representation <https://www.python.org/dev/peps/pep-0393/>`_ * `PEP 412 -- Key-Sharing Dictionary <https://www.python.org/dev/peps/pep-0412/>`_ Add a new dict subtype ---------------------- Add a new ``verdict`` type, subtype of ``dict``. When guards are needed, use the ``verdict`` for namespaces (module namespace, type namespace, instance namespace, etc.) instead of ``dict``. Leave the ``dict`` type unchanged to not add any overhead (memory footprint) when guards are not needed. Technical issue: a lot of C code in the wild, including CPython core, expecting the exact ``dict`` type. Issues: * ``exec()`` requires a ``dict`` for globals and locals. A lot of code use ``globals={}``. It is not possible to cast the ``dict`` to a ``dict`` subtype because the caller expects the ``globals`` parameter to be modified (``dict`` is mutable). * Functions call directly ``PyDict_xxx()`` functions, instead of calling ``PyObject_xxx()`` if the object is a ``dict`` subtype * ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some functions require the exact ``dict`` type. * ``Python/ceval.c`` does not completly supports dict subtypes for namespaces The ``exec()`` issue is a blocker issue. Other issues: * The garbage collector has a special code to "untrack" ``dict`` instances. If a ``dict`` subtype is used for namespaces, the garbage collector can be unable to break some reference cycles. * Some functions have a fast-path for ``dict`` which would not be taken for ``dict`` subtypes, and so it would make Python a little bit slower. Prior Art ========= Method cache and type version tag --------------------------------- In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It was merged into Python 2.6. The patch adds a "type attribute cache version tag" (``tp_version_tag``) and a "valid version tag" flag to types (the ``PyTypeObject`` structure). The type version tag is not available at the Python level. The version tag has the C type ``unsigned int``. The cache is a global hash table of 4096 entries, shared by all types. The cache is global to "make it fast, have a deterministic and low memory footprint, and be easy to invalidate". Each cache entry has a version tag. A global version tag is used to create the next version tag, it also has the C type ``unsigned int``. By default, a type has its "valid version tag" flag cleared to indicate that the version tag is invalid. When the first method of the type is cached, the version tag and the "valid version tag" flag are set. When a type is modified, the "valid version tag" flag of the type and its subclasses is cleared. Later, when a cache entry of these types is used, the entry is removed because its version tag is outdated. On integer overflow, the whole cache is cleared and the global version tag is reset to ``0``. See `Method cache (issue #1685986) <https://bugs.python.org/issue1685986>`_ and `Armin's method cache optimization updated for Python 2.6 (issue #1700288) <https://bugs.python.org/issue1700288>`_. Globals / builtins cache ------------------------ In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue #10401) <http://bugs.python.org/issue10401>`_ which adds a private ``ma_version`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``Py_ssize_t``. The patch adds a "global and builtin cache" to functions and frames, and changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the cache. The change on the ``PyDictObject`` structure is very similar to this PEP. Cached globals+builtins lookup ------------------------------ In 2006, Andrea Griffini proposed a patch implementing a `Cached globals+builtins lookup optimization <https://bugs.python.org/issue1616125>`_. The patch adds a private ``timestamp`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``size_t``. Thread on python-dev: `About dictionary lookup caching <https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_. Guard against changing dict during iteration -------------------------------------------- In 2013, Serhiy Storchaka proposed `Guard against changing dict during iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``size_t``. This field is incremented when the dictionary is modified, and so is very similar to the proposed dictionary version. Sadly, the dictionary version proposed in this PEP doesn't help to detect dictionary mutation. The dictionary version changes when values are replaced, whereas modifying dictionary values while iterating on dictionary keys is legit in Python. PySizer ------- `PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python, Google Summer of Code 2005 project by Nick Smallbone. This project has a patch for CPython 2.4 which adds ``key_time`` and ``value_time`` fields to dictionary entries. It uses a global process-wide counter for dictionaries, incremented each time that a dictionary is modified. The times are used to decide when child objects first appeared in their parent objects. Discussion ========== Thread on the python-ideas mailing list: `RFC: PEP: Add dict.__version__ <https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_. Copyright ========= This document has been placed in the public domain.
Hi Victor. You know that pypy does this stuff without changing and exposing python semantics right? We have a version dict that does not leak abstractions to the user. In general, doing stuff like that where there is a public API that leaks details of certain optimizations makes it harder and harder for optimizing compilers to do their job properly, if you want to do something slightly different. Can we make this happen (as you noted in the prior art) WITHOUT changing ANY of the things exposed to the user? On Mon, Jan 11, 2016 at 6:49 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Hi,
After a first round on python-ideas, here is the second version of my PEP. The main changes since the first version are that the dictionary version is no more exposed at the Python level and the field type now also has a size of 64-bit on 32-bit platforms.
The PEP is part of a serie of 3 PEP adding an API to implement a static Python optimizer specializing functions with guards. The second PEP is currently discussed on python-ideas and I'm still working on the third PEP.
Thanks to Red Hat for giving me time to experiment on this.
HTML version: https://www.python.org/dev/peps/pep-0509/
PEP: 509 Title: Add a private version to dict Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner <victor.stinner@gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 4-January-2016 Python-Version: 3.6
Abstract ========
Add a new private version to builtin ``dict`` type, incremented at each change, to implement fast guards on namespaces.
Rationale =========
In Python, the builtin ``dict`` type is used by many instructions. For example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the global namespace, or in the builtins namespace (two dict lookups). Python uses ``dict`` for the builtins namespace, globals namespace, type namespaces, instance namespaces, etc. The local namespace (namespace of a function) is usually optimized to an array, but it can be a dict too.
Python is hard to optimize because almost everything is mutable: builtin functions, function code, global variables, local variables, ... can be modified at runtime. Implementing optimizations respecting the Python semantics requires to detect when "something changes": we will call these checks "guards".
The speedup of optimizations depends on the speed of guard checks. This PEP proposes to add a version to dictionaries to implement fast guards on namespaces.
Dictionary lookups can be skipped if the version does not change which is the common case for most namespaces. The performance of a guard does not depend on the number of watched dictionary entries, complexity of O(1), if the dictionary version does not change.
Example of optimization: copy the value of a global variable to function constants. This optimization requires a guard on the global variable to check if it was modified. If the variable is modified, the variable must be loaded at runtime when the function is called, instead of using the constant.
See the `PEP 510 -- Specialized functions with guards <https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of guards to specialize functions and for the rationale on Python static optimizers.
Guard example =============
Pseudo-code of an fast guard to check if a dictionary entry was modified (created, updated or deleted) using an hypothetical ``dict_get_version(dict)`` function::
UNSET = object()
class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.value = dict.get(key, UNSET) self.version = dict_get_version(dict)
def check(self): """Return True if the dictionary entry did not changed."""
# read the version field of the dict structure version = dict_get_version(self.dict) if version == self.version: # Fast-path: dictionary lookup avoided return True
# lookup in the dictionary value = self.dict.get(self.key, UNSET) if value is self.value: # another key was modified: # cache the new dictionary version self.version = version return True
# the key was modified return False
Usage of the dict version =========================
Specialized functions using guards ----------------------------------
The `PEP 510 -- Specialized functions with guards <https://www.python.org/dev/peps/pep-0510/>`_ proposes an API to support specialized functions with guards. It allows to implement static optimizers for Python without breaking the Python semantics.
Example of a static Python optimizer: the astoptimizer of the `FAT Python <http://faster-cpython.readthedocs.org/fat_python.html>`_ project implements many optimizations which require guards on namespaces. Examples:
* Call pure builtins: to replace ``len("abc")`` with ``3``, guards on ``builtins.__dict__['len']`` and ``globals()['len']`` are required * Loop unrolling: to unroll the loop ``for i in range(...): ...``, guards on ``builtins.__dict__['range']`` and ``globals()['range']`` are required
Pyjion ------
According of Brett Cannon, one of the two main developers of Pyjion, Pyjion can also benefit from dictionary version to implement optimizations.
Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET Core runtime).
Unladen Swallow ---------------
Even if dictionary version was not explicitly mentionned, optimization globals and builtins lookup was part of the Unladen Swallow plan: "Implement one of the several proposed schemes for speeding lookups of globals and builtins." Source: `Unladen Swallow ProjectPlan <https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_.
Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler implemented with LLVM. The project stopped in 2011: `Unladen Swallow Retrospective <http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html>`_.
Changes =======
Add a ``ma_version`` field to the ``PyDictObject`` structure with the C type ``PY_INT64_T``, 64-bit unsigned integer. New empty dictionaries are initilized to version ``0``. The version is incremented at each change:
* ``clear()`` if the dict was non-empty * ``pop(key)`` if the key exists * ``popitem()`` if the dict is non-empty * ``setdefault(key, value)`` if the `key` does not exist * ``__detitem__(key)`` if the key exists * ``__setitem__(key, value)`` if the `key` doesn't exist or if the value is different * ``update(...)`` if new values are different than existing values (the version can be incremented multiple times)
Example using an hypothetical ``dict_get_version(dict)`` function::
>>> d = {} >>> dict_get_version(d) 0 >>> d['key'] = 'value' >>> dict_get_version(d) 1 >>> d['key'] = 'new value' >>> dict_get_version(d) 2 >>> del d['key'] >>> dict_get_version(d) 3
If a dictionary is created with items, the version is also incremented at each dictionary insertion. Example::
>>> d=dict(x=7, y=33) >>> dict_get_version(d) 2
The version is not incremented if an existing key is set to the same value. For efficiency, values are compared by their identity: ``new_value is old_value``, not by their content: ``new_value == old_value``. Example::
>>> d={} >>> value = object() >>> d['key'] = value >>> dict_get_version(d) 2 >>> d['key'] = value >>> dict_get_version(d) 2
.. note:: CPython uses some singleton like integers in the range [-5; 257], empty tuple, empty strings, Unicode strings of a single character in the range [U+0000; U+00FF], etc. When a key is set twice to the same singleton, the version is not modified.
Implementation ==============
The `issue #26058: PEP 509: Add ma_version to PyDictObject <https://bugs.python.org/issue26058>`_ contains a patch implementing this PEP.
On pybench and timeit microbenchmarks, the patch does not seem to add any overhead on dictionary operations.
When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for a dictioanry lookup, whereas a guard check only takes 3.8 ns. Moreover, a guard can watch for multiple keys. For example, for an optimization using 10 global variables in a function, 10 dictionary lookups costs 148 ns, whereas the guard still only costs 3.8 ns when the version does not change (39x as fast).
Integer overflow ================
The implementation uses the C unsigned integer type ``PY_INT64_T`` to store the version, a 64 bits unsigned integer. The C code uses ``version++``. On integer overflow, the version is wrapped to ``0`` (and then continue to be incremented) according to the C standard.
After an integer overflow, a guard can succeed whereas the watched dictionary key was modified. The bug occurs if the dictionary is modified at least ``2 ** 64`` times between two checks of the guard and if the new version (theorical value with no integer overflow) is equal to the old version modulo ``2 ** 64``.
If a dictionary is modified each nanosecond, an overflow takes longer than 584 years. Using a 32-bit version, the overflow occurs only after 4 seconds. That's why a 64-bit unsigned type is also used on 32-bit systems. A dictionary lookup at the C level takes 14.8 ns.
A risk of a bug every 584 years is acceptable.
Alternatives ============
Expose the version at Python level as a read-only __version__ property ----------------------------------------------------------------------
The first version of the PEP proposed to expose the dictionary version as a read-only ``__version__`` property at Python level, and also to add the property to ``collections.UserDict`` (since this type must mimick the ``dict`` API).
There are multiple issues:
* To be consistent and avoid bad surprises, the version must be added to all mapping types. Implementing a new mapping type would require extra work for no benefit, since the version is only required on the ``dict`` type in practice. * All Python implementations must implement this new property, it gives more work to other implementations, whereas they may not use the dictionary version at all. * The ``__version__`` can be wrapped on integer overflow. It is error prone: using ``dict.__version__ <= guard_version`` is wrong, ``dict.__version__ == guard_version`` must be used instead to reduce the risk of bug on integer overflow (even if the integer overflow is unlikely in practice). * Exposing the dictionary version at Python level can lead the false assumption on performances. Checking ``dict.__version__`` at the Python level is not faster than a dictionary lookup. A dictionary lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5 ns, the difference is only 1.2 ns (3%)::
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33' 10000000 loops, best of 3: 0.0487 usec per loop $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100' 10000000 loops, best of 3: 0.0475 usec per loop
Bikeshedding on the property name:
* ``__cache_token__``: name proposed by Nick Coghlan, name coming from `abc.get_cache_token() <https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_. * ``__version__`` * ``__timestamp__``
Add a version to each dict entry --------------------------------
A single version per dictionary requires to keep a strong reference to the value which can keep the value alive longer than expected. If we add also a version per dictionary entry, the guard can only store the entry version to avoid the strong reference to the value (only strong references to the dictionary and to the key are needed).
Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure, the field has the C type ``PY_INT64_T``. When a key is created or modified, the entry version is set to the dictionary version which is incremented at any change (create, modify, delete).
Pseudo-code of an fast guard to check if a dictionary key was modified using hypothetical ``dict_get_version(dict)`` ``dict_get_entry_version(dict)`` functions::
UNSET = object()
class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.dict_version = dict_get_version(dict) self.entry_version = dict_get_entry_version(dict, key)
def check(self): """Return True if the dictionary entry did not changed."""
# read the version field of the dict structure dict_version = dict_get_version(self.dict) if dict_version == self.version: # Fast-path: dictionary lookup avoided return True
# lookup in the dictionary entry_version = get_dict_key_version(dict, key) if entry_version == self.entry_version: # another key was modified: # cache the new dictionary version self.dict_version = dict_version return True
# the key was modified return False
The main drawback of this option is the impact on the memory footprint. It increases the size of each dictionary entry, so the overhead depends on the number of buckets (dictionary entries, used or unused yet). For example, it increases the size of each dictionary entry by 8 bytes on 64-bit system.
In Python, the memory footprint matters and the trend is to reduce it. Examples:
* `PEP 393 -- Flexible String Representation <https://www.python.org/dev/peps/pep-0393/>`_ * `PEP 412 -- Key-Sharing Dictionary <https://www.python.org/dev/peps/pep-0412/>`_
Add a new dict subtype ----------------------
Add a new ``verdict`` type, subtype of ``dict``. When guards are needed, use the ``verdict`` for namespaces (module namespace, type namespace, instance namespace, etc.) instead of ``dict``.
Leave the ``dict`` type unchanged to not add any overhead (memory footprint) when guards are not needed.
Technical issue: a lot of C code in the wild, including CPython core, expecting the exact ``dict`` type. Issues:
* ``exec()`` requires a ``dict`` for globals and locals. A lot of code use ``globals={}``. It is not possible to cast the ``dict`` to a ``dict`` subtype because the caller expects the ``globals`` parameter to be modified (``dict`` is mutable). * Functions call directly ``PyDict_xxx()`` functions, instead of calling ``PyObject_xxx()`` if the object is a ``dict`` subtype * ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some functions require the exact ``dict`` type. * ``Python/ceval.c`` does not completly supports dict subtypes for namespaces
The ``exec()`` issue is a blocker issue.
Other issues:
* The garbage collector has a special code to "untrack" ``dict`` instances. If a ``dict`` subtype is used for namespaces, the garbage collector can be unable to break some reference cycles. * Some functions have a fast-path for ``dict`` which would not be taken for ``dict`` subtypes, and so it would make Python a little bit slower.
Prior Art =========
Method cache and type version tag ---------------------------------
In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It was merged into Python 2.6. The patch adds a "type attribute cache version tag" (``tp_version_tag``) and a "valid version tag" flag to types (the ``PyTypeObject`` structure).
The type version tag is not available at the Python level.
The version tag has the C type ``unsigned int``. The cache is a global hash table of 4096 entries, shared by all types. The cache is global to "make it fast, have a deterministic and low memory footprint, and be easy to invalidate". Each cache entry has a version tag. A global version tag is used to create the next version tag, it also has the C type ``unsigned int``.
By default, a type has its "valid version tag" flag cleared to indicate that the version tag is invalid. When the first method of the type is cached, the version tag and the "valid version tag" flag are set. When a type is modified, the "valid version tag" flag of the type and its subclasses is cleared. Later, when a cache entry of these types is used, the entry is removed because its version tag is outdated.
On integer overflow, the whole cache is cleared and the global version tag is reset to ``0``.
See `Method cache (issue #1685986) <https://bugs.python.org/issue1685986>`_ and `Armin's method cache optimization updated for Python 2.6 (issue #1700288) <https://bugs.python.org/issue1700288>`_.
Globals / builtins cache ------------------------
In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue #10401) <http://bugs.python.org/issue10401>`_ which adds a private ``ma_version`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``Py_ssize_t``.
The patch adds a "global and builtin cache" to functions and frames, and changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the cache.
The change on the ``PyDictObject`` structure is very similar to this PEP.
Cached globals+builtins lookup ------------------------------
In 2006, Andrea Griffini proposed a patch implementing a `Cached globals+builtins lookup optimization <https://bugs.python.org/issue1616125>`_. The patch adds a private ``timestamp`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``size_t``.
Thread on python-dev: `About dictionary lookup caching <https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_.
Guard against changing dict during iteration --------------------------------------------
In 2013, Serhiy Storchaka proposed `Guard against changing dict during iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``size_t``. This field is incremented when the dictionary is modified, and so is very similar to the proposed dictionary version.
Sadly, the dictionary version proposed in this PEP doesn't help to detect dictionary mutation. The dictionary version changes when values are replaced, whereas modifying dictionary values while iterating on dictionary keys is legit in Python.
PySizer -------
`PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python, Google Summer of Code 2005 project by Nick Smallbone.
This project has a patch for CPython 2.4 which adds ``key_time`` and ``value_time`` fields to dictionary entries. It uses a global process-wide counter for dictionaries, incremented each time that a dictionary is modified. The times are used to decide when child objects first appeared in their parent objects.
Discussion ==========
Thread on the python-ideas mailing list: `RFC: PEP: Add dict.__version__ <https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_.
Copyright =========
This document has been placed in the public domain. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
Le 11 janv. 2016 8:09 PM, "Maciej Fijalkowski" <fijall@gmail.com> a écrit :
Hi Victor.
You know that pypy does this stuff without changing and exposing python semantics right? We have a version dict that does not leak abstractions to the user.
The PEP adds a field to the C structure PyDictObject. Are you asking me to hide it from the C structure? The first version of my PEP added a public read-only property at Python level, but I changed the PEP. See the alternatives section for more detail. Victor
In general, doing stuff like that where there is a public API that leaks details of certain optimizations makes it harder and harder for optimizing compilers to do their job properly, if you want to do something slightly different.
Can we make this happen (as you noted in the prior art) WITHOUT changing ANY of the things exposed to the user?
On Mon, Jan 11, 2016 at 6:49 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Hi,
After a first round on python-ideas, here is the second version of my PEP. The main changes since the first version are that the dictionary version is no more exposed at the Python level and the field type now also has a size of 64-bit on 32-bit platforms.
The PEP is part of a serie of 3 PEP adding an API to implement a static Python optimizer specializing functions with guards. The second PEP is currently discussed on python-ideas and I'm still working on the third PEP.
Thanks to Red Hat for giving me time to experiment on this.
HTML version: https://www.python.org/dev/peps/pep-0509/
PEP: 509 Title: Add a private version to dict Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner <victor.stinner@gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 4-January-2016 Python-Version: 3.6
Abstract ========
Add a new private version to builtin ``dict`` type, incremented at each change, to implement fast guards on namespaces.
Rationale =========
In Python, the builtin ``dict`` type is used by many instructions. For example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the global namespace, or in the builtins namespace (two dict lookups). Python uses ``dict`` for the builtins namespace, globals namespace, type namespaces, instance namespaces, etc. The local namespace (namespace of a function) is usually optimized to an array, but it can be a dict too.
Python is hard to optimize because almost everything is mutable: builtin functions, function code, global variables, local variables, ... can be modified at runtime. Implementing optimizations respecting the Python semantics requires to detect when "something changes": we will call these checks "guards".
The speedup of optimizations depends on the speed of guard checks. This PEP proposes to add a version to dictionaries to implement fast guards on namespaces.
Dictionary lookups can be skipped if the version does not change which is the common case for most namespaces. The performance of a guard does not depend on the number of watched dictionary entries, complexity of O(1), if the dictionary version does not change.
Example of optimization: copy the value of a global variable to function constants. This optimization requires a guard on the global variable to check if it was modified. If the variable is modified, the variable must be loaded at runtime when the function is called, instead of using the constant.
See the `PEP 510 -- Specialized functions with guards <https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of guards to specialize functions and for the rationale on Python static optimizers.
Guard example =============
Pseudo-code of an fast guard to check if a dictionary entry was modified (created, updated or deleted) using an hypothetical ``dict_get_version(dict)`` function::
UNSET = object()
class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.value = dict.get(key, UNSET) self.version = dict_get_version(dict)
def check(self): """Return True if the dictionary entry did not changed."""
# read the version field of the dict structure version = dict_get_version(self.dict) if version == self.version: # Fast-path: dictionary lookup avoided return True
# lookup in the dictionary value = self.dict.get(self.key, UNSET) if value is self.value: # another key was modified: # cache the new dictionary version self.version = version return True
# the key was modified return False
Usage of the dict version =========================
Specialized functions using guards ----------------------------------
The `PEP 510 -- Specialized functions with guards <https://www.python.org/dev/peps/pep-0510/>`_ proposes an API to support specialized functions with guards. It allows to implement static optimizers for Python without breaking the Python semantics.
Example of a static Python optimizer: the astoptimizer of the `FAT Python <http://faster-cpython.readthedocs.org/fat_python.html>`_ project implements many optimizations which require guards on namespaces. Examples:
* Call pure builtins: to replace ``len("abc")`` with ``3``, guards on ``builtins.__dict__['len']`` and ``globals()['len']`` are required * Loop unrolling: to unroll the loop ``for i in range(...): ...``, guards on ``builtins.__dict__['range']`` and ``globals()['range']`` are required
Pyjion ------
According of Brett Cannon, one of the two main developers of Pyjion, Pyjion can also benefit from dictionary version to implement optimizations.
Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET Core runtime).
Unladen Swallow ---------------
Even if dictionary version was not explicitly mentionned, optimization globals and builtins lookup was part of the Unladen Swallow plan: "Implement one of the several proposed schemes for speeding lookups of globals and builtins." Source: `Unladen Swallow ProjectPlan <https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_.
Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler implemented with LLVM. The project stopped in 2011: `Unladen Swallow Retrospective <http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html `_.
Changes =======
Add a ``ma_version`` field to the ``PyDictObject`` structure with the C type ``PY_INT64_T``, 64-bit unsigned integer. New empty dictionaries are initilized to version ``0``. The version is incremented at each change:
* ``clear()`` if the dict was non-empty * ``pop(key)`` if the key exists * ``popitem()`` if the dict is non-empty * ``setdefault(key, value)`` if the `key` does not exist * ``__detitem__(key)`` if the key exists * ``__setitem__(key, value)`` if the `key` doesn't exist or if the value is different * ``update(...)`` if new values are different than existing values (the version can be incremented multiple times)
Example using an hypothetical ``dict_get_version(dict)`` function::
>>> d = {} >>> dict_get_version(d) 0 >>> d['key'] = 'value' >>> dict_get_version(d) 1 >>> d['key'] = 'new value' >>> dict_get_version(d) 2 >>> del d['key'] >>> dict_get_version(d) 3
If a dictionary is created with items, the version is also incremented at each dictionary insertion. Example::
>>> d=dict(x=7, y=33) >>> dict_get_version(d) 2
The version is not incremented if an existing key is set to the same value. For efficiency, values are compared by their identity: ``new_value is old_value``, not by their content: ``new_value == old_value``. Example::
>>> d={} >>> value = object() >>> d['key'] = value >>> dict_get_version(d) 2 >>> d['key'] = value >>> dict_get_version(d) 2
.. note:: CPython uses some singleton like integers in the range [-5; 257], empty tuple, empty strings, Unicode strings of a single character in the range [U+0000; U+00FF], etc. When a key is set twice to the same singleton, the version is not modified.
Implementation ==============
The `issue #26058: PEP 509: Add ma_version to PyDictObject <https://bugs.python.org/issue26058>`_ contains a patch implementing this PEP.
On pybench and timeit microbenchmarks, the patch does not seem to add any overhead on dictionary operations.
When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for a dictioanry lookup, whereas a guard check only takes 3.8 ns. Moreover, a guard can watch for multiple keys. For example, for an optimization using 10 global variables in a function, 10 dictionary lookups costs 148 ns, whereas the guard still only costs 3.8 ns when the version does not change (39x as fast).
Integer overflow ================
The implementation uses the C unsigned integer type ``PY_INT64_T`` to store the version, a 64 bits unsigned integer. The C code uses ``version++``. On integer overflow, the version is wrapped to ``0`` (and then continue to be incremented) according to the C standard.
After an integer overflow, a guard can succeed whereas the watched dictionary key was modified. The bug occurs if the dictionary is modified at least ``2 ** 64`` times between two checks of the guard and if the new version (theorical value with no integer overflow) is equal to the old version modulo ``2 ** 64``.
If a dictionary is modified each nanosecond, an overflow takes longer than 584 years. Using a 32-bit version, the overflow occurs only after 4 seconds. That's why a 64-bit unsigned type is also used on 32-bit systems. A dictionary lookup at the C level takes 14.8 ns.
A risk of a bug every 584 years is acceptable.
Alternatives ============
Expose the version at Python level as a read-only __version__ property ----------------------------------------------------------------------
The first version of the PEP proposed to expose the dictionary version as a read-only ``__version__`` property at Python level, and also to add the property to ``collections.UserDict`` (since this type must mimick the ``dict`` API).
There are multiple issues:
* To be consistent and avoid bad surprises, the version must be added to all mapping types. Implementing a new mapping type would require extra work for no benefit, since the version is only required on the ``dict`` type in practice. * All Python implementations must implement this new property, it gives more work to other implementations, whereas they may not use the dictionary version at all. * The ``__version__`` can be wrapped on integer overflow. It is error prone: using ``dict.__version__ <= guard_version`` is wrong, ``dict.__version__ == guard_version`` must be used instead to reduce the risk of bug on integer overflow (even if the integer overflow is unlikely in practice). * Exposing the dictionary version at Python level can lead the false assumption on performances. Checking ``dict.__version__`` at the Python level is not faster than a dictionary lookup. A dictionary lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5 ns, the difference is only 1.2 ns (3%)::
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33' 10000000 loops, best of 3: 0.0487 usec per loop $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100' 10000000 loops, best of 3: 0.0475 usec per loop
Bikeshedding on the property name:
* ``__cache_token__``: name proposed by Nick Coghlan, name coming from `abc.get_cache_token() <https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_. * ``__version__`` * ``__timestamp__``
Add a version to each dict entry --------------------------------
A single version per dictionary requires to keep a strong reference to the value which can keep the value alive longer than expected. If we add also a version per dictionary entry, the guard can only store the entry version to avoid the strong reference to the value (only strong references to the dictionary and to the key are needed).
Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure, the field has the C type ``PY_INT64_T``. When a key is created or modified, the entry version is set to the dictionary version which is incremented at any change (create, modify, delete).
Pseudo-code of an fast guard to check if a dictionary key was modified using hypothetical ``dict_get_version(dict)`` ``dict_get_entry_version(dict)`` functions::
UNSET = object()
class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.dict_version = dict_get_version(dict) self.entry_version = dict_get_entry_version(dict, key)
def check(self): """Return True if the dictionary entry did not changed."""
# read the version field of the dict structure dict_version = dict_get_version(self.dict) if dict_version == self.version: # Fast-path: dictionary lookup avoided return True
# lookup in the dictionary entry_version = get_dict_key_version(dict, key) if entry_version == self.entry_version: # another key was modified: # cache the new dictionary version self.dict_version = dict_version return True
# the key was modified return False
The main drawback of this option is the impact on the memory footprint. It increases the size of each dictionary entry, so the overhead depends on the number of buckets (dictionary entries, used or unused yet). For example, it increases the size of each dictionary entry by 8 bytes on 64-bit system.
In Python, the memory footprint matters and the trend is to reduce it. Examples:
* `PEP 393 -- Flexible String Representation <https://www.python.org/dev/peps/pep-0393/>`_ * `PEP 412 -- Key-Sharing Dictionary <https://www.python.org/dev/peps/pep-0412/>`_
Add a new dict subtype ----------------------
Add a new ``verdict`` type, subtype of ``dict``. When guards are needed, use the ``verdict`` for namespaces (module namespace, type namespace, instance namespace, etc.) instead of ``dict``.
Leave the ``dict`` type unchanged to not add any overhead (memory footprint) when guards are not needed.
Technical issue: a lot of C code in the wild, including CPython core, expecting the exact ``dict`` type. Issues:
* ``exec()`` requires a ``dict`` for globals and locals. A lot of code use ``globals={}``. It is not possible to cast the ``dict`` to a ``dict`` subtype because the caller expects the ``globals`` parameter to be modified (``dict`` is mutable). * Functions call directly ``PyDict_xxx()`` functions, instead of calling ``PyObject_xxx()`` if the object is a ``dict`` subtype * ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some functions require the exact ``dict`` type. * ``Python/ceval.c`` does not completly supports dict subtypes for namespaces
The ``exec()`` issue is a blocker issue.
Other issues:
* The garbage collector has a special code to "untrack" ``dict`` instances. If a ``dict`` subtype is used for namespaces, the garbage collector can be unable to break some reference cycles. * Some functions have a fast-path for ``dict`` which would not be taken for ``dict`` subtypes, and so it would make Python a little bit slower.
Prior Art =========
Method cache and type version tag ---------------------------------
In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It was merged into Python 2.6. The patch adds a "type attribute cache version tag" (``tp_version_tag``) and a "valid version tag" flag to types (the ``PyTypeObject`` structure).
The type version tag is not available at the Python level.
The version tag has the C type ``unsigned int``. The cache is a global hash table of 4096 entries, shared by all types. The cache is global to "make it fast, have a deterministic and low memory footprint, and be easy to invalidate". Each cache entry has a version tag. A global version tag is used to create the next version tag, it also has the C type ``unsigned int``.
By default, a type has its "valid version tag" flag cleared to indicate that the version tag is invalid. When the first method of the type is cached, the version tag and the "valid version tag" flag are set. When a type is modified, the "valid version tag" flag of the type and its subclasses is cleared. Later, when a cache entry of these types is used, the entry is removed because its version tag is outdated.
On integer overflow, the whole cache is cleared and the global version tag is reset to ``0``.
See `Method cache (issue #1685986) <https://bugs.python.org/issue1685986>`_ and `Armin's method cache optimization updated for Python 2.6 (issue #1700288) <https://bugs.python.org/issue1700288>`_.
Globals / builtins cache ------------------------
In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue #10401) <http://bugs.python.org/issue10401>`_ which adds a private ``ma_version`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``Py_ssize_t``.
The patch adds a "global and builtin cache" to functions and frames, and changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the cache.
The change on the ``PyDictObject`` structure is very similar to this PEP.
Cached globals+builtins lookup ------------------------------
In 2006, Andrea Griffini proposed a patch implementing a `Cached globals+builtins lookup optimization <https://bugs.python.org/issue1616125>`_. The patch adds a private ``timestamp`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``size_t``.
Thread on python-dev: `About dictionary lookup caching <https://mail.python.org/pipermail/python-dev/2006-December/070348.html `_.
Guard against changing dict during iteration --------------------------------------------
In 2013, Serhiy Storchaka proposed `Guard against changing dict during iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``size_t``. This field is incremented when the dictionary is modified, and so is very similar to the proposed dictionary version.
Sadly, the dictionary version proposed in this PEP doesn't help to detect dictionary mutation. The dictionary version changes when values are replaced, whereas modifying dictionary values while iterating on dictionary keys is legit in Python.
PySizer -------
`PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python, Google Summer of Code 2005 project by Nick Smallbone.
This project has a patch for CPython 2.4 which adds ``key_time`` and ``value_time`` fields to dictionary entries. It uses a global process-wide counter for dictionaries, incremented each time that a dictionary is modified. The times are used to decide when child objects first appeared in their parent objects.
Discussion ==========
Thread on the python-ideas mailing list: `RFC: PEP: Add dict.__version__ <https://mail.python.org/pipermail/python-ideas/2016-January/037702.html `_.
Copyright =========
This document has been placed in the public domain. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
On Mon, Jan 11, 2016 at 9:56 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Le 11 janv. 2016 8:09 PM, "Maciej Fijalkowski" <fijall@gmail.com> a écrit :
Hi Victor.
You know that pypy does this stuff without changing and exposing python semantics right? We have a version dict that does not leak abstractions to the user.
The PEP adds a field to the C structure PyDictObject. Are you asking me to hide it from the C structure?
The first version of my PEP added a public read-only property at Python level, but I changed the PEP. See the alternatives section for more detail.
Victor
I asked you to hide it from python, read the wrong version :-) Cool!
On Mon, Jan 11, 2016 at 8:50 AM Victor Stinner <victor.stinner@gmail.com> wrote:
Hi,
After a first round on python-ideas, here is the second version of my PEP. The main changes since the first version are that the dictionary version is no more exposed at the Python level and the field type now also has a size of 64-bit on 32-bit platforms.
The PEP is part of a serie of 3 PEP adding an API to implement a static Python optimizer specializing functions with guards. The second PEP is currently discussed on python-ideas and I'm still working on the third PEP.
Thanks to Red Hat for giving me time to experiment on this.
HTML version: https://www.python.org/dev/peps/pep-0509/
PEP: 509 Title: Add a private version to dict Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner <victor.stinner@gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 4-January-2016 Python-Version: 3.6
Abstract ========
Add a new private version to builtin ``dict`` type, incremented at each change, to implement fast guards on namespaces.
Rationale =========
In Python, the builtin ``dict`` type is used by many instructions. For example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the global namespace, or in the builtins namespace (two dict lookups). Python uses ``dict`` for the builtins namespace, globals namespace, type namespaces, instance namespaces, etc. The local namespace (namespace of a function) is usually optimized to an array, but it can be a dict too.
Python is hard to optimize because almost everything is mutable: builtin functions, function code, global variables, local variables, ... can be modified at runtime. Implementing optimizations respecting the Python semantics requires to detect when "something changes": we will call these checks "guards".
The speedup of optimizations depends on the speed of guard checks. This PEP proposes to add a version to dictionaries to implement fast guards on namespaces.
Dictionary lookups can be skipped if the version does not change which is the common case for most namespaces. The performance of a guard does not depend on the number of watched dictionary entries, complexity of O(1), if the dictionary version does not change.
Example of optimization: copy the value of a global variable to function constants. This optimization requires a guard on the global variable to check if it was modified. If the variable is modified, the variable must be loaded at runtime when the function is called, instead of using the constant.
See the `PEP 510 -- Specialized functions with guards <https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of guards to specialize functions and for the rationale on Python static optimizers.
Guard example =============
Pseudo-code of an fast guard to check if a dictionary entry was modified (created, updated or deleted) using an hypothetical ``dict_get_version(dict)`` function::
UNSET = object()
class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.value = dict.get(key, UNSET) self.version = dict_get_version(dict)
def check(self): """Return True if the dictionary entry did not changed."""
# read the version field of the dict structure version = dict_get_version(self.dict) if version == self.version: # Fast-path: dictionary lookup avoided return True
# lookup in the dictionary value = self.dict.get(self.key, UNSET) if value is self.value: # another key was modified: # cache the new dictionary version self.version = version return True
# the key was modified return False
Usage of the dict version =========================
Specialized functions using guards ----------------------------------
The `PEP 510 -- Specialized functions with guards <https://www.python.org/dev/peps/pep-0510/>`_ proposes an API to support specialized functions with guards. It allows to implement static optimizers for Python without breaking the Python semantics.
Example of a static Python optimizer: the astoptimizer of the `FAT Python <http://faster-cpython.readthedocs.org/fat_python.html>`_ project implements many optimizations which require guards on namespaces. Examples:
* Call pure builtins: to replace ``len("abc")`` with ``3``, guards on ``builtins.__dict__['len']`` and ``globals()['len']`` are required * Loop unrolling: to unroll the loop ``for i in range(...): ...``, guards on ``builtins.__dict__['range']`` and ``globals()['range']`` are required
Pyjion ------
According of Brett Cannon, one of the two main developers of Pyjion, Pyjion can also benefit from dictionary version to implement optimizations.
Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET Core runtime).
Unladen Swallow ---------------
Even if dictionary version was not explicitly mentionned, optimization globals and builtins lookup was part of the Unladen Swallow plan: "Implement one of the several proposed schemes for speeding lookups of globals and builtins." Source: `Unladen Swallow ProjectPlan <https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_.
Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler implemented with LLVM. The project stopped in 2011: `Unladen Swallow Retrospective <http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html
`_.
Changes =======
Add a ``ma_version`` field to the ``PyDictObject`` structure with the C type ``PY_INT64_T``, 64-bit unsigned integer. New empty dictionaries are initilized to version ``0``. The version is incremented at each change:
* ``clear()`` if the dict was non-empty * ``pop(key)`` if the key exists * ``popitem()`` if the dict is non-empty * ``setdefault(key, value)`` if the `key` does not exist * ``__detitem__(key)`` if the key exists * ``__setitem__(key, value)`` if the `key` doesn't exist or if the value is different * ``update(...)`` if new values are different than existing values (the version can be incremented multiple times)
Please be more explicit about what tests you are performing on the values. setitem's "if the value is different" really should mean "if value is not dict['key']". similarly for update, there should never be equality checks performed on the values. just an "is" test of it they are the same object or not.
Example using an hypothetical ``dict_get_version(dict)`` function::
>>> d = {} >>> dict_get_version(d) 0 >>> d['key'] = 'value' >>> dict_get_version(d) 1 >>> d['key'] = 'new value' >>> dict_get_version(d) 2 >>> del d['key'] >>> dict_get_version(d) 3
If a dictionary is created with items, the version is also incremented at each dictionary insertion. Example::
>>> d=dict(x=7, y=33) >>> dict_get_version(d) 2
The version is not incremented if an existing key is set to the same value. For efficiency, values are compared by their identity: ``new_value is old_value``, not by their content: ``new_value == old_value``. Example::
>>> d={} >>> value = object() >>> d['key'] = value >>> dict_get_version(d) 2 >>> d['key'] = value >>> dict_get_version(d) 2
.. note:: CPython uses some singleton like integers in the range [-5; 257], empty tuple, empty strings, Unicode strings of a single character in the range [U+0000; U+00FF], etc. When a key is set twice to the same singleton, the version is not modified.
Implementation ==============
The `issue #26058: PEP 509: Add ma_version to PyDictObject <https://bugs.python.org/issue26058>`_ contains a patch implementing this PEP.
On pybench and timeit microbenchmarks, the patch does not seem to add any overhead on dictionary operations.
When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for a dictioanry lookup, whereas a guard check only takes 3.8 ns. Moreover, a guard can watch for multiple keys. For example, for an optimization using 10 global variables in a function, 10 dictionary lookups costs 148 ns, whereas the guard still only costs 3.8 ns when the version does not change (39x as fast).
Integer overflow ================
The implementation uses the C unsigned integer type ``PY_INT64_T`` to store the version, a 64 bits unsigned integer. The C code uses ``version++``. On integer overflow, the version is wrapped to ``0`` (and then continue to be incremented) according to the C standard.
After an integer overflow, a guard can succeed whereas the watched dictionary key was modified. The bug occurs if the dictionary is modified at least ``2 ** 64`` times between two checks of the guard and if the new version (theorical value with no integer overflow) is equal to the old version modulo ``2 ** 64``.
If a dictionary is modified each nanosecond, an overflow takes longer than 584 years. Using a 32-bit version, the overflow occurs only after 4 seconds. That's why a 64-bit unsigned type is also used on 32-bit systems. A dictionary lookup at the C level takes 14.8 ns.
A risk of a bug every 584 years is acceptable.
Alternatives ============
Expose the version at Python level as a read-only __version__ property ----------------------------------------------------------------------
The first version of the PEP proposed to expose the dictionary version as a read-only ``__version__`` property at Python level, and also to add the property to ``collections.UserDict`` (since this type must mimick the ``dict`` API).
There are multiple issues:
* To be consistent and avoid bad surprises, the version must be added to all mapping types. Implementing a new mapping type would require extra work for no benefit, since the version is only required on the ``dict`` type in practice. * All Python implementations must implement this new property, it gives more work to other implementations, whereas they may not use the dictionary version at all. * The ``__version__`` can be wrapped on integer overflow. It is error prone: using ``dict.__version__ <= guard_version`` is wrong, ``dict.__version__ == guard_version`` must be used instead to reduce the risk of bug on integer overflow (even if the integer overflow is unlikely in practice). * Exposing the dictionary version at Python level can lead the false assumption on performances. Checking ``dict.__version__`` at the Python level is not faster than a dictionary lookup. A dictionary lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5 ns, the difference is only 1.2 ns (3%)::
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33' 10000000 loops, best of 3: 0.0487 usec per loop $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100' 10000000 loops, best of 3: 0.0475 usec per loop
Bikeshedding on the property name:
* ``__cache_token__``: name proposed by Nick Coghlan, name coming from `abc.get_cache_token() <https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_. * ``__version__`` * ``__timestamp__``
Add a version to each dict entry --------------------------------
A single version per dictionary requires to keep a strong reference to the value which can keep the value alive longer than expected. If we add also a version per dictionary entry, the guard can only store the entry version to avoid the strong reference to the value (only strong references to the dictionary and to the key are needed).
Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure, the field has the C type ``PY_INT64_T``. When a key is created or modified, the entry version is set to the dictionary version which is incremented at any change (create, modify, delete).
Pseudo-code of an fast guard to check if a dictionary key was modified using hypothetical ``dict_get_version(dict)`` ``dict_get_entry_version(dict)`` functions::
UNSET = object()
class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.dict_version = dict_get_version(dict) self.entry_version = dict_get_entry_version(dict, key)
def check(self): """Return True if the dictionary entry did not changed."""
# read the version field of the dict structure dict_version = dict_get_version(self.dict) if dict_version == self.version: # Fast-path: dictionary lookup avoided return True
# lookup in the dictionary entry_version = get_dict_key_version(dict, key) if entry_version == self.entry_version: # another key was modified: # cache the new dictionary version self.dict_version = dict_version return True
# the key was modified return False
The main drawback of this option is the impact on the memory footprint. It increases the size of each dictionary entry, so the overhead depends on the number of buckets (dictionary entries, used or unused yet). For example, it increases the size of each dictionary entry by 8 bytes on 64-bit system.
In Python, the memory footprint matters and the trend is to reduce it. Examples:
* `PEP 393 -- Flexible String Representation <https://www.python.org/dev/peps/pep-0393/>`_ * `PEP 412 -- Key-Sharing Dictionary <https://www.python.org/dev/peps/pep-0412/>`_
Add a new dict subtype ----------------------
Add a new ``verdict`` type, subtype of ``dict``. When guards are needed, use the ``verdict`` for namespaces (module namespace, type namespace, instance namespace, etc.) instead of ``dict``.
Leave the ``dict`` type unchanged to not add any overhead (memory footprint) when guards are not needed.
Technical issue: a lot of C code in the wild, including CPython core, expecting the exact ``dict`` type. Issues:
* ``exec()`` requires a ``dict`` for globals and locals. A lot of code use ``globals={}``. It is not possible to cast the ``dict`` to a ``dict`` subtype because the caller expects the ``globals`` parameter to be modified (``dict`` is mutable). * Functions call directly ``PyDict_xxx()`` functions, instead of calling ``PyObject_xxx()`` if the object is a ``dict`` subtype * ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some functions require the exact ``dict`` type. * ``Python/ceval.c`` does not completly supports dict subtypes for namespaces
The ``exec()`` issue is a blocker issue.
Other issues:
* The garbage collector has a special code to "untrack" ``dict`` instances. If a ``dict`` subtype is used for namespaces, the garbage collector can be unable to break some reference cycles. * Some functions have a fast-path for ``dict`` which would not be taken for ``dict`` subtypes, and so it would make Python a little bit slower.
Prior Art =========
Method cache and type version tag ---------------------------------
In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It was merged into Python 2.6. The patch adds a "type attribute cache version tag" (``tp_version_tag``) and a "valid version tag" flag to types (the ``PyTypeObject`` structure).
The type version tag is not available at the Python level.
The version tag has the C type ``unsigned int``. The cache is a global hash table of 4096 entries, shared by all types. The cache is global to "make it fast, have a deterministic and low memory footprint, and be easy to invalidate". Each cache entry has a version tag. A global version tag is used to create the next version tag, it also has the C type ``unsigned int``.
By default, a type has its "valid version tag" flag cleared to indicate that the version tag is invalid. When the first method of the type is cached, the version tag and the "valid version tag" flag are set. When a type is modified, the "valid version tag" flag of the type and its subclasses is cleared. Later, when a cache entry of these types is used, the entry is removed because its version tag is outdated.
On integer overflow, the whole cache is cleared and the global version tag is reset to ``0``.
See `Method cache (issue #1685986) <https://bugs.python.org/issue1685986>`_ and `Armin's method cache optimization updated for Python 2.6 (issue #1700288) <https://bugs.python.org/issue1700288>`_.
Globals / builtins cache ------------------------
In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue #10401) <http://bugs.python.org/issue10401>`_ which adds a private ``ma_version`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``Py_ssize_t``.
The patch adds a "global and builtin cache" to functions and frames, and changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the cache.
The change on the ``PyDictObject`` structure is very similar to this PEP.
Cached globals+builtins lookup ------------------------------
In 2006, Andrea Griffini proposed a patch implementing a `Cached globals+builtins lookup optimization <https://bugs.python.org/issue1616125>`_. The patch adds a private ``timestamp`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``size_t``.
Thread on python-dev: `About dictionary lookup caching <https://mail.python.org/pipermail/python-dev/2006-December/070348.html
`_.
Guard against changing dict during iteration --------------------------------------------
In 2013, Serhiy Storchaka proposed `Guard against changing dict during iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``size_t``. This field is incremented when the dictionary is modified, and so is very similar to the proposed dictionary version.
Sadly, the dictionary version proposed in this PEP doesn't help to detect dictionary mutation. The dictionary version changes when values are replaced, whereas modifying dictionary values while iterating on dictionary keys is legit in Python.
PySizer -------
`PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python, Google Summer of Code 2005 project by Nick Smallbone.
This project has a patch for CPython 2.4 which adds ``key_time`` and ``value_time`` fields to dictionary entries. It uses a global process-wide counter for dictionaries, incremented each time that a dictionary is modified. The times are used to decide when child objects first appeared in their parent objects.
Discussion ==========
Thread on the python-ideas mailing list: `RFC: PEP: Add dict.__version__ <https://mail.python.org/pipermail/python-ideas/2016-January/037702.html
`_.
Copyright =========
This document has been placed in the public domain. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/greg%40krypto.org
2016-01-12 0:07 GMT+01:00 Gregory P. Smith <greg@krypto.org>:
Changes =======
(...)
Please be more explicit about what tests you are performing on the values. setitem's "if the value is different" really should mean "if value is not dict['key']". similarly for update, there should never be equality checks performed on the values. just an "is" test of it they are the same object or not.
Ok, done. By the way, it's also explained below: values are compared by their identify, not by their content. For best dict efficiency, we can not implement this micro-optimization (to avoid a potential branch misprediction in the CPU) and always increase the version. But for guards, the micro-optimization can avoid a lot of dictionary lookups, especially when a guard watches for a large number of keys. Victor
On Jan 11, 2016, at 15:24, Victor Stinner <victor.stinner@gmail.com> wrote:
2016-01-12 0:07 GMT+01:00 Gregory P. Smith <greg@krypto.org>:
Changes =======
(...)
Please be more explicit about what tests you are performing on the values. setitem's "if the value is different" really should mean "if value is not dict['key']". similarly for update, there should never be equality checks performed on the values. just an "is" test of it they are the same object or not.
Ok, done. By the way, it's also explained below: values are compared by their identify, not by their content.
For best dict efficiency, we can not implement this micro-optimization (to avoid a potential branch misprediction in the CPU) and always increase the version. But for guards, the micro-optimization can avoid a lot of dictionary lookups, especially when a guard watches for a large number of keys.
Are you saying that d[key] = d[key] may or may not increment the version, so any optimizer can't rely on the fact that it doesn't? If so, that seems reasonable. (The worst case in incrementing the version unnecessarily is that you miss an optimization that would have been safe, right?).
Well, it was just a remark. 2016-01-12 0:35 GMT+01:00 Andrew Barnert <abarnert@yahoo.com>:
Are you saying that d[key] = d[key] may or may not increment the version, so any optimizer can't rely on the fact that it doesn't?
Optimizers don't have to rely on this exactly behaviour. Not incrementing the version on such case avoids dictionary lookups in the guard. My current patch does not increment if the value is the same, and I'm unable to see any performance regression on *micro* benchmarks: https://bugs.python.org/issue26058 So I'm in favor of making guards as efficient as possible and not increment the version in dict ;-) Victor
Is someone opposed to this PEP 509? The main complain was the change on the public Python API, but the PEP doesn't change the Python API anymore. I'm not aware of any remaining issue on this PEP. Victor 2016-01-11 17:49 GMT+01:00 Victor Stinner <victor.stinner@gmail.com>:
Hi,
After a first round on python-ideas, here is the second version of my PEP. The main changes since the first version are that the dictionary version is no more exposed at the Python level and the field type now also has a size of 64-bit on 32-bit platforms.
The PEP is part of a serie of 3 PEP adding an API to implement a static Python optimizer specializing functions with guards. The second PEP is currently discussed on python-ideas and I'm still working on the third PEP.
Thanks to Red Hat for giving me time to experiment on this.
HTML version: https://www.python.org/dev/peps/pep-0509/
PEP: 509 Title: Add a private version to dict Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner <victor.stinner@gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 4-January-2016 Python-Version: 3.6
Abstract ========
Add a new private version to builtin ``dict`` type, incremented at each change, to implement fast guards on namespaces.
Rationale =========
In Python, the builtin ``dict`` type is used by many instructions. For example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the global namespace, or in the builtins namespace (two dict lookups). Python uses ``dict`` for the builtins namespace, globals namespace, type namespaces, instance namespaces, etc. The local namespace (namespace of a function) is usually optimized to an array, but it can be a dict too.
Python is hard to optimize because almost everything is mutable: builtin functions, function code, global variables, local variables, ... can be modified at runtime. Implementing optimizations respecting the Python semantics requires to detect when "something changes": we will call these checks "guards".
The speedup of optimizations depends on the speed of guard checks. This PEP proposes to add a version to dictionaries to implement fast guards on namespaces.
Dictionary lookups can be skipped if the version does not change which is the common case for most namespaces. The performance of a guard does not depend on the number of watched dictionary entries, complexity of O(1), if the dictionary version does not change.
Example of optimization: copy the value of a global variable to function constants. This optimization requires a guard on the global variable to check if it was modified. If the variable is modified, the variable must be loaded at runtime when the function is called, instead of using the constant.
See the `PEP 510 -- Specialized functions with guards <https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of guards to specialize functions and for the rationale on Python static optimizers.
Guard example =============
Pseudo-code of an fast guard to check if a dictionary entry was modified (created, updated or deleted) using an hypothetical ``dict_get_version(dict)`` function::
UNSET = object()
class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.value = dict.get(key, UNSET) self.version = dict_get_version(dict)
def check(self): """Return True if the dictionary entry did not changed."""
# read the version field of the dict structure version = dict_get_version(self.dict) if version == self.version: # Fast-path: dictionary lookup avoided return True
# lookup in the dictionary value = self.dict.get(self.key, UNSET) if value is self.value: # another key was modified: # cache the new dictionary version self.version = version return True
# the key was modified return False
Usage of the dict version =========================
Specialized functions using guards ----------------------------------
The `PEP 510 -- Specialized functions with guards <https://www.python.org/dev/peps/pep-0510/>`_ proposes an API to support specialized functions with guards. It allows to implement static optimizers for Python without breaking the Python semantics.
Example of a static Python optimizer: the astoptimizer of the `FAT Python <http://faster-cpython.readthedocs.org/fat_python.html>`_ project implements many optimizations which require guards on namespaces. Examples:
* Call pure builtins: to replace ``len("abc")`` with ``3``, guards on ``builtins.__dict__['len']`` and ``globals()['len']`` are required * Loop unrolling: to unroll the loop ``for i in range(...): ...``, guards on ``builtins.__dict__['range']`` and ``globals()['range']`` are required
Pyjion ------
According of Brett Cannon, one of the two main developers of Pyjion, Pyjion can also benefit from dictionary version to implement optimizations.
Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET Core runtime).
Unladen Swallow ---------------
Even if dictionary version was not explicitly mentionned, optimization globals and builtins lookup was part of the Unladen Swallow plan: "Implement one of the several proposed schemes for speeding lookups of globals and builtins." Source: `Unladen Swallow ProjectPlan <https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_.
Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler implemented with LLVM. The project stopped in 2011: `Unladen Swallow Retrospective <http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html>`_.
Changes =======
Add a ``ma_version`` field to the ``PyDictObject`` structure with the C type ``PY_INT64_T``, 64-bit unsigned integer. New empty dictionaries are initilized to version ``0``. The version is incremented at each change:
* ``clear()`` if the dict was non-empty * ``pop(key)`` if the key exists * ``popitem()`` if the dict is non-empty * ``setdefault(key, value)`` if the `key` does not exist * ``__detitem__(key)`` if the key exists * ``__setitem__(key, value)`` if the `key` doesn't exist or if the value is different * ``update(...)`` if new values are different than existing values (the version can be incremented multiple times)
Example using an hypothetical ``dict_get_version(dict)`` function::
>>> d = {} >>> dict_get_version(d) 0 >>> d['key'] = 'value' >>> dict_get_version(d) 1 >>> d['key'] = 'new value' >>> dict_get_version(d) 2 >>> del d['key'] >>> dict_get_version(d) 3
If a dictionary is created with items, the version is also incremented at each dictionary insertion. Example::
>>> d=dict(x=7, y=33) >>> dict_get_version(d) 2
The version is not incremented if an existing key is set to the same value. For efficiency, values are compared by their identity: ``new_value is old_value``, not by their content: ``new_value == old_value``. Example::
>>> d={} >>> value = object() >>> d['key'] = value >>> dict_get_version(d) 2 >>> d['key'] = value >>> dict_get_version(d) 2
.. note:: CPython uses some singleton like integers in the range [-5; 257], empty tuple, empty strings, Unicode strings of a single character in the range [U+0000; U+00FF], etc. When a key is set twice to the same singleton, the version is not modified.
Implementation ==============
The `issue #26058: PEP 509: Add ma_version to PyDictObject <https://bugs.python.org/issue26058>`_ contains a patch implementing this PEP.
On pybench and timeit microbenchmarks, the patch does not seem to add any overhead on dictionary operations.
When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for a dictioanry lookup, whereas a guard check only takes 3.8 ns. Moreover, a guard can watch for multiple keys. For example, for an optimization using 10 global variables in a function, 10 dictionary lookups costs 148 ns, whereas the guard still only costs 3.8 ns when the version does not change (39x as fast).
Integer overflow ================
The implementation uses the C unsigned integer type ``PY_INT64_T`` to store the version, a 64 bits unsigned integer. The C code uses ``version++``. On integer overflow, the version is wrapped to ``0`` (and then continue to be incremented) according to the C standard.
After an integer overflow, a guard can succeed whereas the watched dictionary key was modified. The bug occurs if the dictionary is modified at least ``2 ** 64`` times between two checks of the guard and if the new version (theorical value with no integer overflow) is equal to the old version modulo ``2 ** 64``.
If a dictionary is modified each nanosecond, an overflow takes longer than 584 years. Using a 32-bit version, the overflow occurs only after 4 seconds. That's why a 64-bit unsigned type is also used on 32-bit systems. A dictionary lookup at the C level takes 14.8 ns.
A risk of a bug every 584 years is acceptable.
Alternatives ============
Expose the version at Python level as a read-only __version__ property ----------------------------------------------------------------------
The first version of the PEP proposed to expose the dictionary version as a read-only ``__version__`` property at Python level, and also to add the property to ``collections.UserDict`` (since this type must mimick the ``dict`` API).
There are multiple issues:
* To be consistent and avoid bad surprises, the version must be added to all mapping types. Implementing a new mapping type would require extra work for no benefit, since the version is only required on the ``dict`` type in practice. * All Python implementations must implement this new property, it gives more work to other implementations, whereas they may not use the dictionary version at all. * The ``__version__`` can be wrapped on integer overflow. It is error prone: using ``dict.__version__ <= guard_version`` is wrong, ``dict.__version__ == guard_version`` must be used instead to reduce the risk of bug on integer overflow (even if the integer overflow is unlikely in practice). * Exposing the dictionary version at Python level can lead the false assumption on performances. Checking ``dict.__version__`` at the Python level is not faster than a dictionary lookup. A dictionary lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5 ns, the difference is only 1.2 ns (3%)::
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33' 10000000 loops, best of 3: 0.0487 usec per loop $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100' 10000000 loops, best of 3: 0.0475 usec per loop
Bikeshedding on the property name:
* ``__cache_token__``: name proposed by Nick Coghlan, name coming from `abc.get_cache_token() <https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_. * ``__version__`` * ``__timestamp__``
Add a version to each dict entry --------------------------------
A single version per dictionary requires to keep a strong reference to the value which can keep the value alive longer than expected. If we add also a version per dictionary entry, the guard can only store the entry version to avoid the strong reference to the value (only strong references to the dictionary and to the key are needed).
Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure, the field has the C type ``PY_INT64_T``. When a key is created or modified, the entry version is set to the dictionary version which is incremented at any change (create, modify, delete).
Pseudo-code of an fast guard to check if a dictionary key was modified using hypothetical ``dict_get_version(dict)`` ``dict_get_entry_version(dict)`` functions::
UNSET = object()
class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.dict_version = dict_get_version(dict) self.entry_version = dict_get_entry_version(dict, key)
def check(self): """Return True if the dictionary entry did not changed."""
# read the version field of the dict structure dict_version = dict_get_version(self.dict) if dict_version == self.version: # Fast-path: dictionary lookup avoided return True
# lookup in the dictionary entry_version = get_dict_key_version(dict, key) if entry_version == self.entry_version: # another key was modified: # cache the new dictionary version self.dict_version = dict_version return True
# the key was modified return False
The main drawback of this option is the impact on the memory footprint. It increases the size of each dictionary entry, so the overhead depends on the number of buckets (dictionary entries, used or unused yet). For example, it increases the size of each dictionary entry by 8 bytes on 64-bit system.
In Python, the memory footprint matters and the trend is to reduce it. Examples:
* `PEP 393 -- Flexible String Representation <https://www.python.org/dev/peps/pep-0393/>`_ * `PEP 412 -- Key-Sharing Dictionary <https://www.python.org/dev/peps/pep-0412/>`_
Add a new dict subtype ----------------------
Add a new ``verdict`` type, subtype of ``dict``. When guards are needed, use the ``verdict`` for namespaces (module namespace, type namespace, instance namespace, etc.) instead of ``dict``.
Leave the ``dict`` type unchanged to not add any overhead (memory footprint) when guards are not needed.
Technical issue: a lot of C code in the wild, including CPython core, expecting the exact ``dict`` type. Issues:
* ``exec()`` requires a ``dict`` for globals and locals. A lot of code use ``globals={}``. It is not possible to cast the ``dict`` to a ``dict`` subtype because the caller expects the ``globals`` parameter to be modified (``dict`` is mutable). * Functions call directly ``PyDict_xxx()`` functions, instead of calling ``PyObject_xxx()`` if the object is a ``dict`` subtype * ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some functions require the exact ``dict`` type. * ``Python/ceval.c`` does not completly supports dict subtypes for namespaces
The ``exec()`` issue is a blocker issue.
Other issues:
* The garbage collector has a special code to "untrack" ``dict`` instances. If a ``dict`` subtype is used for namespaces, the garbage collector can be unable to break some reference cycles. * Some functions have a fast-path for ``dict`` which would not be taken for ``dict`` subtypes, and so it would make Python a little bit slower.
Prior Art =========
Method cache and type version tag ---------------------------------
In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It was merged into Python 2.6. The patch adds a "type attribute cache version tag" (``tp_version_tag``) and a "valid version tag" flag to types (the ``PyTypeObject`` structure).
The type version tag is not available at the Python level.
The version tag has the C type ``unsigned int``. The cache is a global hash table of 4096 entries, shared by all types. The cache is global to "make it fast, have a deterministic and low memory footprint, and be easy to invalidate". Each cache entry has a version tag. A global version tag is used to create the next version tag, it also has the C type ``unsigned int``.
By default, a type has its "valid version tag" flag cleared to indicate that the version tag is invalid. When the first method of the type is cached, the version tag and the "valid version tag" flag are set. When a type is modified, the "valid version tag" flag of the type and its subclasses is cleared. Later, when a cache entry of these types is used, the entry is removed because its version tag is outdated.
On integer overflow, the whole cache is cleared and the global version tag is reset to ``0``.
See `Method cache (issue #1685986) <https://bugs.python.org/issue1685986>`_ and `Armin's method cache optimization updated for Python 2.6 (issue #1700288) <https://bugs.python.org/issue1700288>`_.
Globals / builtins cache ------------------------
In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue #10401) <http://bugs.python.org/issue10401>`_ which adds a private ``ma_version`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``Py_ssize_t``.
The patch adds a "global and builtin cache" to functions and frames, and changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the cache.
The change on the ``PyDictObject`` structure is very similar to this PEP.
Cached globals+builtins lookup ------------------------------
In 2006, Andrea Griffini proposed a patch implementing a `Cached globals+builtins lookup optimization <https://bugs.python.org/issue1616125>`_. The patch adds a private ``timestamp`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``size_t``.
Thread on python-dev: `About dictionary lookup caching <https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_.
Guard against changing dict during iteration --------------------------------------------
In 2013, Serhiy Storchaka proposed `Guard against changing dict during iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``size_t``. This field is incremented when the dictionary is modified, and so is very similar to the proposed dictionary version.
Sadly, the dictionary version proposed in this PEP doesn't help to detect dictionary mutation. The dictionary version changes when values are replaced, whereas modifying dictionary values while iterating on dictionary keys is legit in Python.
PySizer -------
`PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python, Google Summer of Code 2005 project by Nick Smallbone.
This project has a patch for CPython 2.4 which adds ``key_time`` and ``value_time`` fields to dictionary entries. It uses a global process-wide counter for dictionaries, incremented each time that a dictionary is modified. The times are used to decide when child objects first appeared in their parent objects.
Discussion ==========
Thread on the python-ideas mailing list: `RFC: PEP: Add dict.__version__ <https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_.
Copyright =========
This document has been placed in the public domain.
On Jan 18, 2016, at 11:43 PM, Victor Stinner wrote:
Is someone opposed to this PEP 509?
The main complain was the change on the public Python API, but the PEP doesn't change the Python API anymore.
I'm not aware of any remaining issue on this PEP.
Have you done any performance analysis for a wide range of Python applications? Did you address my suggestion on python-ideas to make the new C API optionally compiled in? I still think this is maintenance and potential performance overhead we don't want to commit to long term unless it enables significant optimization. Since you probably can't prove that without some experimentation, this API should be provisional. Cheers, -Barry
On 2016-01-18 5:43 PM, Victor Stinner wrote:
Is someone opposed to this PEP 509?
The main complain was the change on the public Python API, but the PEP doesn't change the Python API anymore.
I'm not aware of any remaining issue on this PEP.
Victor, I've been experimenting with the PEP to implement a per-opcode cache in ceval loop (I'll share my progress on that in a few days). This allows to significantly speedup LOAD_GLOBAL and LOAD_METHOD opcodes, to the point, where they don't require any dict lookups at all. Some macro-benchmarks (such as chameleon_v2) demonstrate impressive ~10% performance boost. I rely on your dict->ma_version to implement cache invalidation. However, besides guarding against version change, I also want to guard against the dict being swapped for another dict, to avoid situations like this: def foo(): print(bar) exec(foo.__code__, {'bar': 1}, {}) exec(foo.__code__, {'bar': 2}, {}) What I propose is to add a pointer "ma_extra" (same 64bits), which will be set to NULL for most dict instances (instead of ma_version). "ma_extra" can then point to a struct that has a globally unique dict ID (uint64), and a version tag (unit64). A macro like PyDict_GET_ID and PyDict_GET_VERSION could then efficiently fetch the version/unique ID of the dict for guards. "ma_extra" would also make it easier for us to extend dicts in the future. Yury
The easiest version is to have global numbering (as opposed to local). Anyway, I would strongly suggest getting some benchmarks done and showing performance benefits first, because you don't want PEPs to be final when you don't exactly know the details. On Wed, Jan 20, 2016 at 7:02 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2016-01-18 5:43 PM, Victor Stinner wrote:
Is someone opposed to this PEP 509?
The main complain was the change on the public Python API, but the PEP doesn't change the Python API anymore.
I'm not aware of any remaining issue on this PEP.
Victor,
I've been experimenting with the PEP to implement a per-opcode cache in ceval loop (I'll share my progress on that in a few days). This allows to significantly speedup LOAD_GLOBAL and LOAD_METHOD opcodes, to the point, where they don't require any dict lookups at all. Some macro-benchmarks (such as chameleon_v2) demonstrate impressive ~10% performance boost.
I rely on your dict->ma_version to implement cache invalidation.
However, besides guarding against version change, I also want to guard against the dict being swapped for another dict, to avoid situations like this:
def foo(): print(bar)
exec(foo.__code__, {'bar': 1}, {}) exec(foo.__code__, {'bar': 2}, {})
What I propose is to add a pointer "ma_extra" (same 64bits), which will be set to NULL for most dict instances (instead of ma_version). "ma_extra" can then point to a struct that has a globally unique dict ID (uint64), and a version tag (unit64). A macro like PyDict_GET_ID and PyDict_GET_VERSION could then efficiently fetch the version/unique ID of the dict for guards.
"ma_extra" would also make it easier for us to extend dicts in the future.
Yury
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
On Wed, 20 Jan 2016 at 10:11 Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2016-01-18 5:43 PM, Victor Stinner wrote:
Is someone opposed to this PEP 509?
The main complain was the change on the public Python API, but the PEP doesn't change the Python API anymore.
I'm not aware of any remaining issue on this PEP.
Victor,
I've been experimenting with the PEP to implement a per-opcode cache in ceval loop (I'll share my progress on that in a few days). This allows to significantly speedup LOAD_GLOBAL and LOAD_METHOD opcodes, to the point, where they don't require any dict lookups at all. Some macro-benchmarks (such as chameleon_v2) demonstrate impressive ~10% performance boost.
Ooh, now my brain is trying to figure out the design of the cache. :)
I rely on your dict->ma_version to implement cache invalidation.
However, besides guarding against version change, I also want to guard against the dict being swapped for another dict, to avoid situations like this:
def foo(): print(bar)
exec(foo.__code__, {'bar': 1}, {}) exec(foo.__code__, {'bar': 2}, {})
What I propose is to add a pointer "ma_extra" (same 64bits), which will be set to NULL for most dict instances (instead of ma_version). "ma_extra" can then point to a struct that has a globally unique dict ID (uint64), and a version tag (unit64). A macro like PyDict_GET_ID and PyDict_GET_VERSION could then efficiently fetch the version/unique ID of the dict for guards.
"ma_extra" would also make it easier for us to extend dicts in the future.
Why can't you simply use the id of the dict object as the globally unique dict ID? It's already globally unique amongst all Python objects which makes it inherently unique amongst dicts.
On Wed, Jan 20, 2016 at 7:22 PM, Brett Cannon <brett@python.org> wrote:
On Wed, 20 Jan 2016 at 10:11 Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2016-01-18 5:43 PM, Victor Stinner wrote:
Is someone opposed to this PEP 509?
The main complain was the change on the public Python API, but the PEP doesn't change the Python API anymore.
I'm not aware of any remaining issue on this PEP.
Victor,
I've been experimenting with the PEP to implement a per-opcode cache in ceval loop (I'll share my progress on that in a few days). This allows to significantly speedup LOAD_GLOBAL and LOAD_METHOD opcodes, to the point, where they don't require any dict lookups at all. Some macro-benchmarks (such as chameleon_v2) demonstrate impressive ~10% performance boost.
Ooh, now my brain is trying to figure out the design of the cache. :)
I rely on your dict->ma_version to implement cache invalidation.
However, besides guarding against version change, I also want to guard against the dict being swapped for another dict, to avoid situations like this:
def foo(): print(bar)
exec(foo.__code__, {'bar': 1}, {}) exec(foo.__code__, {'bar': 2}, {})
What I propose is to add a pointer "ma_extra" (same 64bits), which will be set to NULL for most dict instances (instead of ma_version). "ma_extra" can then point to a struct that has a globally unique dict ID (uint64), and a version tag (unit64). A macro like PyDict_GET_ID and PyDict_GET_VERSION could then efficiently fetch the version/unique ID of the dict for guards.
"ma_extra" would also make it easier for us to extend dicts in the future.
Why can't you simply use the id of the dict object as the globally unique dict ID? It's already globally unique amongst all Python objects which makes it inherently unique amongst dicts.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
Brett, you need two things - the ID of the dict and the version tag. What we do in pypy is we have a small object (called, surprisingly, VersionTag()) and we use the ID of that. That way you can change the version id of an existing dict and have only one field.
On 2016-01-20 1:36 PM, Maciej Fijalkowski wrote:
On Wed, Jan 20, 2016 at 7:22 PM, Brett Cannon <brett@python.org> wrote:
On Wed, 20 Jan 2016 at 10:11 Yury Selivanov <yselivanov.ml@gmail.com> wrote:
[..]
"ma_extra" would also make it easier for us to extend dicts in the future.
Why can't you simply use the id of the dict object as the globally unique dict ID? It's already globally unique amongst all Python objects which makes it inherently unique amongst dicts.
Brett, you need two things - the ID of the dict and the version tag. What we do in pypy is we have a small object (called, surprisingly, VersionTag()) and we use the ID of that. That way you can change the version id of an existing dict and have only one field.
Yeah, that's essentially what I propose with ma_extra. Yury
On Wed, Jan 20, 2016 at 8:00 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2016-01-20 1:36 PM, Maciej Fijalkowski wrote:
On Wed, Jan 20, 2016 at 7:22 PM, Brett Cannon <brett@python.org> wrote:
On Wed, 20 Jan 2016 at 10:11 Yury Selivanov <yselivanov.ml@gmail.com> wrote:
[..]
"ma_extra" would also make it easier for us to extend dicts in the future.
Why can't you simply use the id of the dict object as the globally unique dict ID? It's already globally unique amongst all Python objects which makes it inherently unique amongst dicts.
Brett, you need two things - the ID of the dict and the version tag. What we do in pypy is we have a small object (called, surprisingly, VersionTag()) and we use the ID of that. That way you can change the version id of an existing dict and have only one field.
Yeah, that's essentially what I propose with ma_extra.
Yury
The trick is we use only one field :-) you're proposing to have both fields - version tag and dict id. Why not just use the id of the object (without any fields)?
On 2016-01-20 2:02 PM, Maciej Fijalkowski wrote:
On Wed, Jan 20, 2016 at 8:00 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
[..]
Brett, you need two things - the ID of the dict and the version tag. What we do in pypy is we have a small object (called, surprisingly, VersionTag()) and we use the ID of that. That way you can change the version id of an existing dict and have only one field. Yeah, that's essentially what I propose with ma_extra.
Yury The trick is we use only one field :-)
you're proposing to have both fields - version tag and dict id. Why not just use the id of the object (without any fields)?
What if your old dict is GCed, its "VersionTag()" (1) object is freed, and you have a new dict, for which a new "VersionTag()" (2) object happens to be allocated at the same address as (1)? Yury
On Wed, Jan 20, 2016 at 8:08 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2016-01-20 2:02 PM, Maciej Fijalkowski wrote:
On Wed, Jan 20, 2016 at 8:00 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
[..]
Brett, you need two things - the ID of the dict and the version tag. What we do in pypy is we have a small object (called, surprisingly, VersionTag()) and we use the ID of that. That way you can change the version id of an existing dict and have only one field.
Yeah, that's essentially what I propose with ma_extra.
Yury
The trick is we use only one field :-)
you're proposing to have both fields - version tag and dict id. Why not just use the id of the object (without any fields)?
What if your old dict is GCed, its "VersionTag()" (1) object is freed, and you have a new dict, for which a new "VersionTag()" (2) object happens to be allocated at the same address as (1)?
Yury
You don't free a version tag that's stored in the guard. You store the object and not id
On 2016-01-20 2:09 PM, Maciej Fijalkowski wrote:
You don't free a version tag that's stored in the guard. You store the object and not id
Ah, got it. Although for my current cache design it would be more memory efficient to use the dict itself to store its own unique id and tag, hence my "ma_extra" proposal. In any case, the current "ma_version" proposal is flawed :( Yury
there is also the problem that you don't want it on all dicts. So having two extra words is more to pay than having extra objects (also, comparison is cheaper for guards) On Wed, Jan 20, 2016 at 8:23 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2016-01-20 2:09 PM, Maciej Fijalkowski wrote:
You don't free a version tag that's stored in the guard. You store the object and not id
Ah, got it. Although for my current cache design it would be more memory efficient to use the dict itself to store its own unique id and tag, hence my "ma_extra" proposal. In any case, the current "ma_version" proposal is flawed :(
Yury
On 1/20/2016 10:36 AM, Maciej Fijalkowski wrote:
Why can't you simply use the id of the dict object as the globally unique
dict ID? It's already globally unique amongst all Python objects which makes it inherently unique amongst dicts.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
Brett, you need two things - the ID of the dict and the version tag. What we do in pypy is we have a small object (called, surprisingly, VersionTag()) and we use the ID of that. That way you can change the version id of an existing dict and have only one field. For the reuse case, can't you simply keep the ma_version "live" in dict items on the free list, rather than starting over at (presumably) 0 ? Then if the dict is reused, it bumps the ma_version, and the fallback code goes on with (presumably) relocating the original dict (oops, it's gone), and dealing with the fallout.
Then you can use the regular dict id as the globally unique dict id? And only need the one uint64, rather than a separately allocated extra pair of uint64?
On 2016-01-20 2:45 PM, Glenn Linderman wrote:
For the reuse case, can't you simply keep the ma_version "live" in dict items on the free list, rather than starting over at (presumably) 0 ? Then if the dict is reused, it bumps the ma_version, and the fallback code goes on with (presumably) relocating the original dict (oops, it's gone), and dealing with the fallout.
Not all dicts are created from a freelist, and not all dicts go to the freelist when they are GCed. You still can have this situation: - dict "A" is used as f_locals for a frame, its ma_version is set to X - dict "A" is GCed, but the freelist is full, so it's just freed - after a while, you call the code object, a new dict "B" is allocated with malloc (since now the freelist happens to be empty) for f_locals - it happens that "B" is allocated where "A" was, and its ma_version happens to be X by an accident
Then you can use the regular dict id as the globally unique dict id? And only need the one uint64, rather than a separately allocated extra pair of uint64?
In my design only namespace dicts will have a non-NULL ma_extra, which means that only a fraction of dicts will actually have a separated pair of uint64s. I think that we should either use one global ma_version (as Maciej suggested) or ma_extra, as it gives more flexibility for us to extend dicts in the future. A global (shared between all dicts) unit64 ma_version is actually quite reliable -- if a program does 1,000,000 dict modifications per second, it would take it 600,000 years till wrap-around. Yury
On Wed, 20 Jan 2016 at 12:27 Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2016-01-20 2:45 PM, Glenn Linderman wrote:
For the reuse case, can't you simply keep the ma_version "live" in dict items on the free list, rather than starting over at (presumably) 0 ? Then if the dict is reused, it bumps the ma_version, and the fallback code goes on with (presumably) relocating the original dict (oops, it's gone), and dealing with the fallout.
Not all dicts are created from a freelist, and not all dicts go to the freelist when they are GCed.
You still can have this situation:
- dict "A" is used as f_locals for a frame, its ma_version is set to X - dict "A" is GCed, but the freelist is full, so it's just freed - after a while, you call the code object, a new dict "B" is allocated with malloc (since now the freelist happens to be empty) for f_locals - it happens that "B" is allocated where "A" was, and its ma_version happens to be X by an accident
Then you can use the regular dict id as the globally unique dict id? And only need the one uint64, rather than a separately allocated extra pair of uint64?
In my design only namespace dicts will have a non-NULL ma_extra, which means that only a fraction of dicts will actually have a separated pair of uint64s.
I think that we should either use one global ma_version (as Maciej suggested) or ma_extra, as it gives more flexibility for us to extend dicts in the future.
A global (shared between all dicts) unit64 ma_version is actually quite reliable -- if a program does 1,000,000 dict modifications per second, it would take it 600,000 years till wrap-around.
Since you're going to need to hold the GIL for the modifications there won't be any locking or contention problems, so it sounds like the global value is the best since that's simple, uses the least amount of memory, and will be easiest to use as a modification check since that will be a simple uint64 comparison instead of comparing a GUID + version.
On 1/20/2016 12:50 PM, Brett Cannon wrote:
A global (shared between all dicts) unit64 ma_version is actually quite reliable -- if a program does 1,000,000 dict modifications per second, it would take it 600,000 years till wrap-around.
But would invalidate everything, instead of just a fraction of things, on every update to anything that is monitored...
Hi, 2016-01-20 22:18 GMT+01:00 Glenn Linderman <v+python@g.nevcal.com>:
On 1/20/2016 12:50 PM, Brett Cannon wrote:
A global (shared between all dicts) unit64 ma_version is actually quite reliable -- if a program does 1,000,000 dict modifications per second, it would take it 600,000 years till wrap-around.
I think that Yury found a bug in FAT Python. I didn't test the case when the builtins dictionary is replaced after the definition of the function. To be more concrete: when a function is executed in a different namespace using exec(code, namespace). That's why I like the PEP process, it helps to find all issues before going too far :-) I like the idea of global counter for dictionary versions. It means that the dictionary constructor increases this counter instead of always starting to 0. FYI a fat.GuardDict keeps a strong reference to the dictionary. For some guards, I hesitated to store the object identifier and/or using a weak reference. An object identifier is not reliable because the object can be destroyed and a new object, completly different, or of the same type, can get the same identifier.
But would invalidate everything, instead of just a fraction of things, on every update to anything that is monitored...
I don't understand this point. In short, the guard only has to compare two 64 bit integers in the fast-path, when nothing changed. For a namespace, it means that no value was replaced in this namespace. If a different namespace is modified, the version of the watched namespace does not change, so we are still in the fast-path. If a value is replaced in the watched namespace, but not the watched variable, we have to take a slow-path, hopefully only once. The worst case is when a value different than the watched value is modified between each guard check. In this case, we always need a dict lookup. An heuristic can be chosen to decide to give up after N tries. Currently, fat.GuardDict always retries. Victor
On Wed, 20 Jan 2016 at 15:46 Victor Stinner <victor.stinner@gmail.com> wrote:
Hi,
2016-01-20 22:18 GMT+01:00 Glenn Linderman <v+python@g.nevcal.com>:
On 1/20/2016 12:50 PM, Brett Cannon wrote:
A global (shared between all dicts) unit64 ma_version is actually quite reliable -- if a program does 1,000,000 dict modifications per second, it would take it 600,000 years till wrap-around.
I think that Yury found a bug in FAT Python. I didn't test the case when the builtins dictionary is replaced after the definition of the function. To be more concrete: when a function is executed in a different namespace using exec(code, namespace). That's why I like the PEP process, it helps to find all issues before going too far :-)
I like the idea of global counter for dictionary versions. It means that the dictionary constructor increases this counter instead of always starting to 0.
FYI a fat.GuardDict keeps a strong reference to the dictionary. For some guards, I hesitated to store the object identifier and/or using a weak reference. An object identifier is not reliable because the object can be destroyed and a new object, completly different, or of the same type, can get the same identifier.
But would invalidate everything, instead of just a fraction of things, on every update to anything that is monitored...
I don't understand this point.
I think Glenn was assuming we had a single, global version # that all dicts shared *without* having a per-dict version ID. The key thing here is that we have a global counter that tracks the number of mutations for *all* dictionaries but whose value we store as a *per-dictionary* value. That ends up making the version ID inherently both a token representing the state of any dict but also the uniqueness of the dict since no two dictionaries will ever have the same version ID.
In short, the guard only has to compare two 64 bit integers in the fast-path, when nothing changed. For a namespace, it means that no value was replaced in this namespace.
If a different namespace is modified, the version of the watched namespace does not change, so we are still in the fast-path.
If a value is replaced in the watched namespace, but not the watched variable, we have to take a slow-path, hopefully only once.
The worst case is when a value different than the watched value is modified between each guard check. In this case, we always need a dict lookup. An heuristic can be chosen to decide to give up after N tries. Currently, fat.GuardDict always retries.
Does "retries" mean "check if the value really changed, and if it hasn't then just update the version ID the guard checks"?
On 1/20/2016 4:08 PM, Brett Cannon wrote:
On Wed, 20 Jan 2016 at 15:46 Victor Stinner <victor.stinner@gmail.com <mailto:victor.stinner@gmail.com>> wrote:
Hi,
2016-01-20 22:18 GMT+01:00 Glenn Linderman <v+python@g.nevcal.com <mailto:v%2Bpython@g.nevcal.com>>: > On 1/20/2016 12:50 PM, Brett Cannon wrote: >> >> A global (shared between all dicts) unit64 ma_version is actually quite >> reliable -- if a program does 1,000,000 dict modifications per second, >> it would take it 600,000 years till wrap-around.
I think that Yury found a bug in FAT Python. I didn't test the case when the builtins dictionary is replaced after the definition of the function. To be more concrete: when a function is executed in a different namespace using exec(code, namespace). That's why I like the PEP process, it helps to find all issues before going too far :-)
I like the idea of global counter for dictionary versions. It means that the dictionary constructor increases this counter instead of always starting to 0.
FYI a fat.GuardDict keeps a strong reference to the dictionary. For some guards, I hesitated to store the object identifier and/or using a weak reference. An object identifier is not reliable because the object can be destroyed and a new object, completly different, or of the same type, can get the same identifier.
> But would invalidate everything, instead of just a fraction of things, on > every update to anything that is monitored...
I don't understand this point.
I think Glenn was assuming we had a single, global version # that all dicts shared *without* having a per-dict version ID. The key thing here is that we have a global counter that tracks the number of mutations for *all* dictionaries but whose value we store as a *per-dictionary* value. That ends up making the version ID inherently both a token representing the state of any dict but also the uniqueness of the dict since no two dictionaries will ever have the same version ID.
This would work. You were correct about my assumptions.
2016-01-21 1:08 GMT+01:00 Brett Cannon <brett@python.org>:
On Wed, 20 Jan 2016 at 15:46 Victor Stinner <victor.stinner@gmail.com>
The worst case is when a value different than the watched value is modified between each guard check. In this case, we always need a dict lookup. An heuristic can be chosen to decide to give up after N tries. Currently, fat.GuardDict always retries.
Does "retries" mean "check if the value really changed, and if it hasn't then just update the version ID the guard checks"?
If the dict version changes (because a value different than the watched value is modified) each time that the guard is checked, the guard always require a dict lookup to check if the watched value changed. Victor
On Wednesday, January 20, 2016 4:10 PM, Brett Cannon <brett@python.org> wrote:
I think Glenn was assuming we had a single, global version # that all dicts shared without having a per-dict version ID. The key thing here is that we have a global counter that tracks the number of mutations for all dictionaries but whose value we store as a per-dictionary value. That ends up making the version ID inherently both a token representing the state of any dict but also the uniqueness of the dict since no two dictionaries will ever have the same version ID.
This idea worries me. I'm not sure why, but I think because of threading. After all, it's pretty rare for two threads to both want to work on the same dict, but very, very common for two threads to both want to work on _any_ dict. So, imagine someone manages to remove the GIL from CPython by using STM: now most transactions are bumping that global counter, meaning most transactions fail and have to be retried, so you end up with 8 cores each running at 1/64th the speed of a single core but burning 100% CPU. Obviously a real-life implementation wouldn't be _that_ stupid; you'd special-case the version-bumping (maybe unconditionally bump it N times before starting the transaction, and then as long as you don't bump more than N times during the transaction, you can commit without touching it), but there's still going to be a lot of contention. And that also affects something like PyPy being able to use FAT-Python-style AoT optimizations via cpyext. At first glance that sounds like a stupid idea--why would you want to run an optimizer through a slow emulator? But the optimizer only runs once and transforms the function code, which runs a zillion times, so who cares how slow the optimizer is? Of course it may still be true that many of the AoT optimizations that FAT makes don't apply very well to PyPy, in which case it doesn't matter. But I don't think we can assume that a priori. Is there a way to define this loosely enough so that the implementation _can_ be a single global counter, if that turns out to be most efficient, but can also be a counter per dictionary and a globally-unique ID per dictionary?
On Wed, 20 Jan 2016, 17:54 Andrew Barnert <abarnert@yahoo.com> wrote:
On Wednesday, January 20, 2016 4:10 PM, Brett Cannon <brett@python.org> wrote:
I think Glenn was assuming we had a single, global version # that all dicts shared without having a per-dict version ID. The key thing here is that we have a global counter that tracks the number of mutations for all dictionaries but whose value we store as a per-dictionary value. That ends up making the version ID inherently both a token representing the state of any dict but also the uniqueness of the dict since no two dictionaries will ever have the same version ID.
This idea worries me. I'm not sure why, but I think because of threading. After all, it's pretty rare for two threads to both want to work on the same dict, but very, very common for two threads to both want to work on _any_ dict. So, imagine someone manages to remove the GIL from CPython by using STM: now most transactions are bumping that global counter, meaning most transactions fail and have to be retried, so you end up with 8 cores each running at 1/64th the speed of a single core but burning 100% CPU. Obviously a real-life implementation wouldn't be _that_ stupid; you'd special-case the version-bumping (maybe unconditionally bump it N times before starting the transaction, and then as long as you don't bump more than N times during the transaction, you can commit without touching it), but there's still going to be a lot of contention.
This is all being regarded as an implementation detail of CPython, so in this hypothetical STM world we can drop all of this (or lock it).
And that also affects something like PyPy being able to use FAT-Python-style AoT optimizations via cpyext. At first glance that sounds like a stupid idea--why would you want to run an optimizer through a slow emulator? But the optimizer only runs once and transforms the function code, which runs a zillion times, so who cares how slow the optimizer is? Of course it may still be true that many of the AoT optimizations that FAT makes don't apply very well to PyPy, in which case it doesn't matter. But I don't think we can assume that a priori.
Is there a way to define this loosely enough so that the implementation _can_ be a single global counter, if that turns out to be most efficient, but can also be a counter per dictionary and a globally-unique ID per dictionary?
There's no need to if this is all under the hood and in no way affects anyone but the eval loop and those who choose to use it. We can make sure to preface all of this with underscores so it's obvious they are private and so use at your own peril.
On 2016-01-20 8:54 PM, Andrew Barnert via Python-Dev wrote:
I think Glenn was assuming we had a single, global version # that all dicts shared without having a per-dict version ID. The key thing here is that we have a global counter that tracks the number of mutations for all dictionaries but whose value we store as a per-dictionary value. That ends up making the version ID inherently both a token representing the state of any dict but also the uniqueness of the dict since no two dictionaries will ever have the same version ID. This idea worries me. I'm not sure why, but I think because of threading. After all, it's pretty rare for two threads to both want to work on the same dict, but very, very common for two threads to both want to work on_any_ dict. So, imagine someone manages to remove the GIL from CPython by using STM: now most transactions are bumping that global counter, meaning most transactions fail and have to be retried, so you end up with 8 cores each running at 1/64th the speed of a single core but burning 100% CPU. Obviously a real-life implementation wouldn't be_that_ stupid; you'd special-case the version-bumping (maybe unconditionally bump it N times before starting the transaction, and then as long as you don't bump more than N times during the transaction, you can commit without touching it), but there's still going to be a lot of contention.
Well, PEP 509 proposes to add ma_version only for CPython. It's an implementation detail of CPython. Victor's FAT optimizer is also tailored specifically for CPython, and making it work on PyPy would require a completely different set of hacks. To remove the GIL or implement an efficient STM one will have to rewrite (and potentially break) so much code in CPython, that ma_version won't be a concern. For now, though, ma_version will be protected by GIL, so threading shouldn't be a problem.
And that also affects something like PyPy being able to use FAT-Python-style AoT optimizations via cpyext. At first glance that sounds like a stupid idea--why would you want to run an optimizer through a slow emulator? But the optimizer only runs once and transforms the function code, which runs a zillion times, so who cares how slow the optimizer is? Of course it may still be true that many of the AoT optimizations that FAT makes don't apply very well to PyPy, in which case it doesn't matter. But I don't think we can assume that a priori.
The idea of FAT is that it will also generate optimized code objects with guards. I doubt it would make any sense to use it under PyPy or any jitted Python implementation. JITs have a far better understanding of the code than any static optimizer.
Is there a way to define this loosely enough so that the implementation_can_ be a single global counter, if that turns out to be most efficient, but can also be a counter per dictionary and a globally-unique ID per dictionary?
Defining it "loosely" means that you can't trust it. I'd just explicitly say that: - ma_version is an implementation detail of CPython and may not be implemented on other platforms; - ma_version can be removed from future CPython releases; - ma_version can be used by code optimizers tailored specifically for CPython and CPython itself. Yury
Andrew Barnert via Python-Dev wrote:
imagine someone manages to remove the GIL from CPython by using STM: now most transactions are bumping that global counter, meaning most transactions fail and have to be retried,
If this becomes a problem, the tag could be split into two parts of m and n bits, with m + n = 64. Use a global counter for allocating the high half, and increment the low half locally. When the low half overflows, allocate a new high half. A value of n = 16 or so ought to reduce contention for the global counter to something fairly negligible, I would think, without much risk of the high half ever wrapping around. -- Greg
2016-01-21 2:54 GMT+01:00 Andrew Barnert <abarnert@yahoo.com>:
This idea worries me. I'm not sure why, but I think because of threading. After all, it's pretty rare for two threads to both want to work on the same dict, but very, very common for two threads to both want to work on _any_ dict. So, imagine someone manages to remove the GIL from CPython by using STM: (...)
That's a huge project :-) PyPy works one this, but PyPy is not CPython.
(...) now most transactions are bumping that global counter, meaning most transactions fail and have to be retried,
Python has atomic types which work well with multiple concurrent threads.
And that also affects something like PyPy being able to use FAT-Python-style AoT optimizations via cpyext.
I don't think that using a static optimizer with PyPy makes sense. Reminder that a dynamic optimize (JIT compiler) beats a static compiler on performance ;-) PyPy has a very different design, it's a very bad idea to implement optimizations on cpyext which is emulated and known to be slow.
Is there a way to define this loosely enough so that the implementation _can_ be a single global counter, if that turns out to be most efficient, but can also be a counter per dictionary and a globally-unique ID per dictionary?
I don't see how a single global counter for all dictionary can be used to implement fast guards on namespaces. See the rationale of the PEP 509: I wrote it to implement fast guards on namespaces. Victor
Brett, On 2016-01-20 1:22 PM, Brett Cannon wrote:
On Wed, 20 Jan 2016 at 10:11 Yury Selivanov <yselivanov.ml@gmail.com <mailto:yselivanov.ml@gmail.com>> wrote:
On 2016-01-18 5:43 PM, Victor Stinner wrote: > Is someone opposed to this PEP 509? > > The main complain was the change on the public Python API, but the PEP > doesn't change the Python API anymore. > > I'm not aware of any remaining issue on this PEP.
Victor,
I've been experimenting with the PEP to implement a per-opcode cache in ceval loop (I'll share my progress on that in a few days). This allows to significantly speedup LOAD_GLOBAL and LOAD_METHOD opcodes, to the point, where they don't require any dict lookups at all. Some macro-benchmarks (such as chameleon_v2) demonstrate impressive ~10% performance boost.
Ooh, now my brain is trying to figure out the design of the cache. :)
Yeah, it's tricky. I'll need some time to draft a comprehensible overview. And I want to implement a couple more optimizations and benchmark it better. BTW, I've some updates (html5lib benchmark for py3, new benchmarks for calling C methods, and I want to port some PyPy benchmakrs) to the benchmarks suite. Should I just commit them, or should I use bugs.python.org?
I rely on your dict->ma_version to implement cache invalidation.
However, besides guarding against version change, I also want to guard against the dict being swapped for another dict, to avoid situations like this:
def foo(): print(bar)
exec(foo.__code__, {'bar': 1}, {}) exec(foo.__code__, {'bar': 2}, {})
What I propose is to add a pointer "ma_extra" (same 64bits), which will be set to NULL for most dict instances (instead of ma_version). "ma_extra" can then point to a struct that has a globally unique dict ID (uint64), and a version tag (unit64). A macro like PyDict_GET_ID and PyDict_GET_VERSION could then efficiently fetch the version/unique ID of the dict for guards.
"ma_extra" would also make it easier for us to extend dicts in the future.
Why can't you simply use the id of the dict object as the globally unique dict ID? It's already globally unique amongst all Python objects which makes it inherently unique amongst dicts.
We have a freelist for dicts -- so if the dict dies, there could be a new dict in its place, with the same ma_version. While the probability of such hiccups is low, we still have to account for it. Yury
On Wed, 20 Jan 2016 at 10:46 Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Brett,
On 2016-01-20 1:22 PM, Brett Cannon wrote:
On Wed, 20 Jan 2016 at 10:11 Yury Selivanov <yselivanov.ml@gmail.com <mailto:yselivanov.ml@gmail.com>> wrote:
On 2016-01-18 5:43 PM, Victor Stinner wrote: > Is someone opposed to this PEP 509? > > The main complain was the change on the public Python API, but the PEP > doesn't change the Python API anymore. > > I'm not aware of any remaining issue on this PEP.
Victor,
I've been experimenting with the PEP to implement a per-opcode cache in ceval loop (I'll share my progress on that in a few days). This allows to significantly speedup LOAD_GLOBAL and LOAD_METHOD opcodes, to the point, where they don't require any dict lookups at all. Some macro-benchmarks (such as chameleon_v2) demonstrate impressive ~10% performance boost.
Ooh, now my brain is trying to figure out the design of the cache. :)
Yeah, it's tricky. I'll need some time to draft a comprehensible overview. And I want to implement a couple more optimizations and benchmark it better.
BTW, I've some updates (html5lib benchmark for py3, new benchmarks for calling C methods, and I want to port some PyPy benchmakrs) to the benchmarks suite. Should I just commit them, or should I use bugs.python.org?
I actually emailed speed@ to see if people were interested in finally sitting down with all the various VM implementations at PyCon and trying to come up with a reasonable base set of benchmarks that better reflect modern Python usage, but I never heard back. Anyway, issues on bugs.python.org are probably best to talk about new benchmarks before adding them (fixes and updates to pre-existing benchmarks can just go in).
I rely on your dict->ma_version to implement cache invalidation.
However, besides guarding against version change, I also want to guard against the dict being swapped for another dict, to avoid situations like this:
def foo(): print(bar)
exec(foo.__code__, {'bar': 1}, {}) exec(foo.__code__, {'bar': 2}, {})
What I propose is to add a pointer "ma_extra" (same 64bits), which will be set to NULL for most dict instances (instead of ma_version). "ma_extra" can then point to a struct that has a globally unique dict ID (uint64), and a version tag (unit64). A macro like PyDict_GET_ID and PyDict_GET_VERSION could then efficiently fetch the version/unique ID of the dict for guards.
"ma_extra" would also make it easier for us to extend dicts in the future.
Why can't you simply use the id of the dict object as the globally unique dict ID? It's already globally unique amongst all Python objects which makes it inherently unique amongst dicts.
We have a freelist for dicts -- so if the dict dies, there could be a new dict in its place, with the same ma_version.
Ah, I figured it would be too simple to use something we already had.
While the probability of such hiccups is low, we still have to account for it.
Yep.
Yury Selivanov wrote:
What I propose is to add a pointer "ma_extra" (same 64bits), which will be set to NULL for most dict instances (instead of ma_version). "ma_extra" can then point to a struct that has a globally unique dict ID (uint64), and a version tag (unit64).
Why not just use a single global counter for allocating dict version tags, instead of one per dict? -- Greg
On 2016-01-20 5:01 PM, Greg Ewing wrote:
Yury Selivanov wrote:
What I propose is to add a pointer "ma_extra" (same 64bits), which will be set to NULL for most dict instances (instead of ma_version). "ma_extra" can then point to a struct that has a globally unique dict ID (uint64), and a version tag (unit64).
Why not just use a single global counter for allocating dict version tags, instead of one per dict?
Yeah, I think that's what we agreed on: https://mail.python.org/pipermail/python-dev/2016-January/142837.html The only advantage of ma_extra pointer is that it allows to add more stuff to dicts in the future. Yury
2016-01-21 6:08 GMT+01:00 Yury Selivanov <yselivanov.ml@gmail.com>:
Yeah, I think that's what we agreed on: https://mail.python.org/pipermail/python-dev/2016-January/142837.html
The only advantage of ma_extra pointer is that it allows to add more stuff to dicts in the future.
I don't agree on ma_extra since I don't understand it :-) What is the advantage compared to a simple integer? If it's a pointer, it requires to dereference the pointer? You say that it can be NULL, does it mean that we also have to first test if the pointer is NULL. Does it mean that we have to allocate a second memory block to store a version tag object? When do you create a version tag or not? Note: The PEP 509 proposes to use 64-bit integer for the version on 32-bit systems to avoid integer overflow after a few seconds. I first proposed to use the size_t type (which has the size of a pointer) but it doesn't work. I tried to fix FAT Python to handle your use case: function defined in a namespace and run in a different namespace (especially for the builtin dictionary). It looks like I don't need the discussion change to use a global version, FAT Python guards have a different design than your cache. But if a global counter doesn't make the slow more complex and opens new kinds of optimization, I agree to change my PEP 509. Victor
On 11/01/16 16:49, Victor Stinner wrote:
Hi,
After a first round on python-ideas, here is the second version of my PEP. The main changes since the first version are that the dictionary version is no more exposed at the Python level and the field type now also has a size of 64-bit on 32-bit platforms.
The PEP is part of a serie of 3 PEP adding an API to implement a static Python optimizer specializing functions with guards. The second PEP is currently discussed on python-ideas and I'm still working on the third PEP.
If anyone wants to experiment (at the C, not Python, level) with dict versioning to optimise load-global/builtins, then you can do so without adding a version number. A "version" can created by splitting the dict with "make_keys_shared" and then making the keys-object immutable by setting "dk_usable" to zero. This means that any change to the keys will force a keys-object change, but changes to the values will not. For many optimisations, this is want you want. Using this trick: To read a global, check that the keys is the expected keys and read the value straight out of the values array at the known index. To read a builtins, check that the module keys is the expected keys and thus cannot shadow the builtins, then read the builtins as above. I don't know how much help this will be for a static optimiser, but it could work well for a dynamic optimiser. I used this optimisation in HotPy for optimising object attribute lookups. Cheers, Mark.
participants (10)
-
Andrew Barnert
-
Barry Warsaw
-
Brett Cannon
-
Glenn Linderman
-
Greg Ewing
-
Gregory P. Smith
-
Maciej Fijalkowski
-
Mark Shannon
-
Victor Stinner
-
Yury Selivanov