[Python-Dev] [RFC] Removing pure Python implementation of OrderedDict
INADA Naoki
songofacandy at gmail.com
Tue Sep 5 14:20:43 EDT 2017
First of all, I saw enough -1 so I gave up about removing.
Following reply is just a technical topic.
On Wed, Sep 6, 2017 at 2:13 AM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
[snip]
>
> Like Antoine, I consider the pure Python implementation to be
> valuable. Furthermore, the pure Python implementation is the
> reference, so its behavior is idiomatic.
>
What is *idiomatic*? For example, in this test case:
https://github.com/python/cpython/blob/564a2c68add64ebf2e558a54f5697513b19293cb/Lib/test/test_ordered_dict.py#L575-L582
def test_dict_delitem(self):
OrderedDict = self.OrderedDict
od = OrderedDict()
od['spam'] = 1
od['ham'] = 2
dict.__delitem__(od, 'spam')
with self.assertRaises(KeyError):
repr(od)
Since dict.__delitem__ is same to OrderedDict.__delitem__ in PyPy
implementation,
repr(od) doesn't raise KeyError.
>>>> import collections
>>>> od = collections.OrderedDict()
>>>> od['spam'] = 1
>>>> od['ham'] = 2
>>>> dict.__delitem__(od, 'spam')
>>>> repr(od)
"OrderedDict([('ham', 2)])"
Do you think this behavior is not idiomatic?
I feel both are OK and there are no idiomatic behavior.
The tested behavior seems just an implementation detail.
At least, PyPy modified this test as I wrote in previous mail.
It makes sense to me.
>> ### Thread safety
>>
>> AFAIK, there are no thread safety guarantee in OrderedDict.
>> I don't look carefully, but some methods seems thread unsafe.
>
> What isn't thread-safe? I know that Raymond has a good understanding
> of this area. For instance, he was very clear about re-entrancy
> concerns when I was working on the C implementation. I recommend
> getting feedback from him on this. FWIW, I don't recall any bugs
> related to thread-safety in OrderedDict, even though it's been around
> a while.
For example, when one thread do `od[k1] = v1` and another thread do
`od[k2] = v2`,
result should be equivalent to one of `od[k1] = v1; od[k2] = v2;` or
`od[k2] = v2; od[k1] = v1`. And internal data structure should be consistent.
But see current Pure Python implementation:
def __setitem__(self, key, value,
dict_setitem=dict.__setitem__, proxy=_proxy, Link=_Link):
'od.__setitem__(i, y) <==> od[i]=y'
# Setting a new item creates a new link at the end of the linked list,
# and the inherited dictionary is updated with the new key/value pair.
if key not in self:
self.__map[key] = link = Link()
root = self.__root
last = root.prev
link.prev, link.next, link.key = last, root, key
last.next = link
root.prev = proxy(link)
dict_setitem(self, key, value)
If thread is switched right after `root = self.__root`, either k1 or k2 is
disappeared in linked list.
More worth, if thread switch is happen right after `last = root.prev`,
linked list will be broken.
Fixing such issue is not easy job. See
http://bugs.python.org/issue14976 for example.
On the other hand, thanks to GIL, C implementation (not only mine, but
also your's)
doesn't have such issue.
Sometime, writing reentrant and threadsafe container in C is easier than
in Pure Python.
Regards,
P.S.
Since it's over 3 a.m. in Japan, I'll read and reply other mails tomorrow.
Sorry for delay.
More information about the Python-Dev
mailing list