[Python-Dev] [RFC] Removing pure Python implementation of OrderedDict

Tue Sep 5 14:20:43 EDT 2017

First of all, I saw enough -1 so I gave up about removing.
Following reply is just a technical topic.

On Wed, Sep 6, 2017 at 2:13 AM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
[snip]
>
> Like Antoine, I consider the pure Python implementation to be
> valuable.  Furthermore, the pure Python implementation is the
> reference, so its behavior is idiomatic.
>

What is *idiomatic*?  For example, in this test case:

https://github.com/python/cpython/blob/564a2c68add64ebf2e558a54f5697513b19293cb/Lib/test/test_ordered_dict.py#L575-L582

    def test_dict_delitem(self):
        OrderedDict = self.OrderedDict
        od = OrderedDict()
        od['spam'] = 1
        od['ham'] = 2
        dict.__delitem__(od, 'spam')
        with self.assertRaises(KeyError):
            repr(od)

Since dict.__delitem__ is same to OrderedDict.__delitem__ in PyPy
implementation,
repr(od) doesn't raise KeyError.

    >>>> import collections
    >>>> od = collections.OrderedDict()
    >>>> od['spam'] = 1
    >>>> od['ham'] = 2
    >>>> dict.__delitem__(od, 'spam')
    >>>> repr(od)
    "OrderedDict([('ham', 2)])"

Do you think this behavior is not idiomatic?

I feel both are OK and there are no idiomatic behavior.
The tested behavior seems just an implementation detail.

At least, PyPy modified this test as I wrote in previous mail.
It makes sense to me.

>> ### Thread safety
>>
>> AFAIK, there are no thread safety guarantee in OrderedDict.
>> I don't look carefully, but some methods seems thread unsafe.
>
> What isn't thread-safe?  I know that Raymond has a good understanding
> of this area.  For instance, he was very clear about re-entrancy
> concerns when I was working on the C implementation.  I recommend
> getting feedback from him on this.  FWIW, I don't recall any bugs
> related to thread-safety in OrderedDict, even though it's been around
> a while.

For example, when one thread do `od[k1] = v1` and another thread do
`od[k2] = v2`,
result should be equivalent to one of `od[k1] = v1; od[k2] = v2;` or
`od[k2] = v2; od[k1] = v1`.  And internal data structure should be consistent.

But see current Pure Python implementation:

    def __setitem__(self, key, value,
                    dict_setitem=dict.__setitem__, proxy=_proxy, Link=_Link):
        'od.__setitem__(i, y) <==> od[i]=y'
        # Setting a new item creates a new link at the end of the linked list,
        # and the inherited dictionary is updated with the new key/value pair.
        if key not in self:
            self.__map[key] = link = Link()
            root = self.__root
            last = root.prev
            link.prev, link.next, link.key = last, root, key
            last.next = link
            root.prev = proxy(link)
        dict_setitem(self, key, value)

If thread is switched right after `root = self.__root`, either k1 or k2 is
disappeared in linked list.
More worth, if thread switch is happen right after `last = root.prev`,
linked list will be broken.

Fixing such issue is not easy job.  See
http://bugs.python.org/issue14976 for example.

On the other hand, thanks to GIL, C implementation (not only mine, but
also your's)
doesn't have such issue.

Sometime, writing reentrant and threadsafe container in C is easier than
in Pure Python.

Regards,

P.S.
Since it's over 3 a.m. in Japan, I'll read and reply other mails tomorrow.
Sorry for delay.