[Python-ideas] thread safe dictionary initialisation from mappings: dict(non_dict_mapping)
Anselm Kruis
a.kruis at science-computing.de
Mon Nov 19 18:24:08 CET 2012
Hello,
I found the following annoying behaviour of dict(non_dict_mapping) and
dict.update(non_dict_mapping), if non_dict_mapping implements
collections.abc.Mapping but is not an instance of dict. In this case the
implementations of dict() and dict.update() use PyDict_Merge(PyObject
*a, PyObject *b, int override).
The essential part of PyDict_Merge(a,b, override) is
# update dict a with the content of mapping b.
keys = b.keys()
for key in keys:
...
a[key] = b.__getitem__(key)
This algorithm is susceptible to race conditions, if a second thread
modifies the source mapping b between "b.keys()" and b.__getitem__(key):
- If the second thread deletes an item from b, PyDict_Merge fails with a
KeyError exception.
- If the second thread inserts a new value and then modifies an existing
value, a contains the modified value but not the new value.
Of course the current behaviour is the best you can get with a "minimum
mapping interface".
My Goal
-------
I want to be able to implement a mapping so that "dict(mapping)" works
in an "atomic" way.
Requirements for a solution
---------------------------
- it does not modify the behaviour of existing code
- the performance must be similar to the current implementation
- simple to implement
- sufficiently generic
Proposal
--------
My idea is to define a new optional method "__update_other__(setter)"
for mappings. A minimal implementation of this method looks like:
def __update_other__(self, setter):
for key in self.keys():
setter(key, self[key])
Now it is possible to extend PyDict_Merge(PyObject *a, PyObject *b, int
override) to check, if the source mapping b implements
"__update_other__()". If it does, PyDict_Merge simply calls
b.__update_other__. Otherwise it falls back to the current
implementation using keys() and __getitem__().
Pseudo code for PyDict_Merge(PyObject *a, PyObject *b, int override)
if hasattr(b, "__update__other__") and callable(b.__update_other__):
if override:
b.__update_other__(a.__setitem__)
else:
b.__update_other__(a.setdefault)
return
# old code follows
keys = b.keys()
for key in keys:
a[key] = b.__getitem__(key)
Example
-------
A typical implementation of __update_other__ for a mapping with
concurrent access looks like:
def __update_other__(self, setter):
# perform an appropriate locking
with self._lock:
for key in self._keys:
setter(key, self._values[key])
Note 1:
The __update_other__(self, setter) interface hides the nature of the
caller. It is also possible to retrieve only the keys or only the values
using suitable setter arguments.
Note 2:
In order to preserve the behaviour of existing code, the method
__update_other__() must not be added to existing mapping implementations
(collections.abc.Mapping, UserDict, ...). But adding support for
__update_other__ to __init__() and update() of existing implementations
is OK.
Note 3:
If we add add a test for __update_other__ to PyDict_Merge in front of
the "PyDict_Check(b)" test this would also solve the issues described in
http://bugs.python.org/issue1615701. A sub-class of dict could implement
__update_other__ to prevent PyDict_Merge from accessing its underlying
dict implementation.
Note 4: Alternative approach
Obviously it is possible to write "dict(non_dict_mapping.items())" (or
even "dict(non_dict_mapping.iteritems())", if iteritems() is implemented
in a suitable way), but constructing the list of items is an expensive
operation. Additionally it is fairly common and pythonic to write
"dict(mapping)" to get a copy of mapping.
Anselm
--
Dipl. Phys. Anselm Kruis science + computing ag
Senior Solution Architect Ingolstädter Str. 22
email A.Kruis at science-computing.de 80807 München, Germany
phone +49 89 356386 874 fax 737 www.science-computing.de
--
Vorstandsvorsitzender/Chairman of the board of management:
Gerd-Lothar Leonhart
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Michael Heinrichs,
Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196
More information about the Python-ideas
mailing list