[Python-ideas] thread safe dictionary initialisation from mappings: dict(non_dict_mapping)

Anselm Kruis a.kruis at science-computing.de
Mon Nov 19 18:24:08 CET 2012


Hello,

I found the following annoying behaviour of dict(non_dict_mapping) and 
dict.update(non_dict_mapping), if non_dict_mapping implements 
collections.abc.Mapping but is not an instance of dict. In this case the 
implementations of dict() and dict.update() use PyDict_Merge(PyObject 
*a, PyObject *b, int override).

The essential part of PyDict_Merge(a,b, override) is

# update dict a with the content of mapping b.
keys = b.keys()
for key in keys:
    ...
    a[key] = b.__getitem__(key)

This algorithm is susceptible to race conditions, if a second thread 
modifies the source mapping b between "b.keys()" and b.__getitem__(key):
- If the second thread deletes an item from b, PyDict_Merge fails with a
KeyError exception.
- If the second thread inserts a new value and then modifies an existing 
value, a contains the modified value but not the new value.

Of course the current behaviour is the best you can get with a "minimum 
mapping interface".


My Goal
-------
I want to be able to implement a mapping so that "dict(mapping)" works 
in an "atomic" way.


Requirements for a solution
---------------------------
- it does not modify the behaviour of existing code
- the performance must be similar to the current implementation
- simple to implement
- sufficiently generic


Proposal
--------
My idea is to define a new optional method "__update_other__(setter)" 
for mappings. A minimal implementation of this method looks like:

def __update_other__(self, setter):
   for key in self.keys():
      setter(key, self[key])

Now it is possible to extend PyDict_Merge(PyObject *a, PyObject *b, int 
override) to check, if the source mapping b implements 
"__update_other__()". If it does, PyDict_Merge simply calls 
b.__update_other__. Otherwise it falls back to the current 
implementation using keys() and __getitem__().

Pseudo code for PyDict_Merge(PyObject *a, PyObject *b, int override)

if hasattr(b, "__update__other__") and callable(b.__update_other__):
   if override:
     b.__update_other__(a.__setitem__)
   else:
     b.__update_other__(a.setdefault)
   return
# old code follows
keys = b.keys()
   for key in keys:
     a[key] = b.__getitem__(key)


Example
-------
A typical implementation of __update_other__ for a mapping with 
concurrent access looks like:

def __update_other__(self, setter):
   # perform an appropriate locking
   with self._lock:
     for key in self._keys:
       setter(key, self._values[key])


Note 1:
The __update_other__(self, setter) interface hides the nature of the 
caller. It is also possible to retrieve only the keys or only the values 
using suitable setter arguments.

Note 2:
In order to preserve the behaviour of existing code, the method 
__update_other__() must not be added to existing mapping implementations 
(collections.abc.Mapping, UserDict, ...). But adding support for 
__update_other__ to __init__() and update() of existing implementations 
is OK.

Note 3:
If we add add a test for __update_other__ to PyDict_Merge in front of 
the "PyDict_Check(b)" test this would also solve the issues described in 
http://bugs.python.org/issue1615701. A sub-class of dict could implement 
__update_other__ to prevent PyDict_Merge from accessing its underlying 
dict implementation.

Note 4: Alternative approach
Obviously it is possible to write "dict(non_dict_mapping.items())" (or 
even "dict(non_dict_mapping.iteritems())", if iteritems() is implemented 
in a suitable way), but constructing the list of items is an expensive 
operation. Additionally it is fairly common and pythonic to write 
"dict(mapping)" to get a copy of mapping.


Anselm

-- 
  Dipl. Phys. Anselm Kruis                       science + computing ag
  Senior Solution Architect                      Ingolstädter Str. 22
  email A.Kruis at science-computing.de             80807 München, Germany
  phone +49 89 356386 874  fax 737               www.science-computing.de
-- 
Vorstandsvorsitzender/Chairman of the board of management:
Gerd-Lothar Leonhart
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Michael Heinrichs, 
Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196




More information about the Python-ideas mailing list