[Python-ideas] thread safe dictionary initialisation from mappings: dict(non_dict_mapping)

Nick Coghlan ncoghlan at gmail.com
Tue Nov 20 12:20:03 CET 2012


On Tue, Nov 20, 2012 at 7:33 PM, Anselm Kruis
<a.kruis at science-computing.de>wrote:

> Am 19.11.2012 22:09, schrieb Terry Reedy:
>
>  On 11/19/2012 12:24 PM, Anselm Kruis wrote:
>>
>>> Hello,
>>>
>>> I found the following annoying behaviour of dict(non_dict_mapping) and
>>> dict.update(non_dict_mapping), if non_dict_mapping implements
>>> collections.abc.Mapping but is not an instance of dict. In this case the
>>> implementations of dict() and dict.update() use PyDict_Merge(PyObject
>>> *a, PyObject *b, int override).
>>>
>>> The essential part of PyDict_Merge(a,b, override) is
>>>
>>> # update dict a with the content of mapping b.
>>> keys = b.keys()
>>> for key in keys:
>>>     ...
>>>     a[key] = b.__getitem__(key)
>>>
>>> This algorithm is susceptible to race conditions, if a second thread
>>> modifies the source mapping b between "b.keys()" and b.__getitem__(key):
>>> - If the second thread deletes an item from b, PyDict_Merge fails with a
>>> KeyError exception.
>>> - If the second thread inserts a new value and then modifies an existing
>>> value, a contains the modified value but not the new value.
>>>
>>
>> It is well-known that mutating a collection while iterating over it can
>> lead to unexpected or undesired behavior, including exceptions. This is
>> not limited updating a dict from a non-dict source. The generic answer
>> is Don't Do That.
>>
>
> Actually that's not the case here: the implementation of dict does not
> iterate over the collection while another thread mutates the collection. It
> iterates over a list of the keys and this list does not change.


Whether or not the keys() method makes a copy of the underlying keys is
entirely up to the collection - e.g. the Python 3 dict type returns a live
view of the underlying dictionary from keys()/values()/items() rather than
returning a list copy as it did in Python 2.

Building and working with containers in a thread safe manner is inherently
challenging, and given that the standard Python containers only make
minimal efforts in that direction (relying heavily on the GIL and the way
it interacts with components written in C) it's unlikely you're ever going
to achieve adequate results without exposing an explicit locking API. For
example, you could make your container a context manager, so people could
write:

    with my_threadsafe_mapping:
        dict_copy = dict(my_threadsafe_mapping)

This has the advantage of making it easy to serialise *any* multi-step
operation on your container on its internal lock, not just the specific
case of copying to a builtin dictionary.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20121120/141c9172/attachment.html>


More information about the Python-ideas mailing list