[New-bugs-announce] [issue27575] dict viewkeys intersection slow for large dicts

David Su report at bugs.python.org
Tue Jul 19 13:51:47 EDT 2016


New submission from David Su:

In dictobject.c, the function dictviews_and performs a very expensive set creation operation on the keys of the self dict object:

>From Python 2.7:
...
static PyObject*
dictviews_and(PyObject* self, PyObject *other)
{
    PyObject *result = PySet_New(self);
...
    tmp = PyObject_CallMethod(result, "intersection_update", "(O)", other);
...

Similar logic can be found in the Python 3 code as well.


For large dicts (~10 000 000 keys), it takes a significant amount of time to copy the keys into a new set, and also wastes a lot of memory. This issue is amplified when the intersection operation is performed many times.

This implementation uses O(len(self)) time and space, while it is expected to use O(min(len(self), len(other)) time and space.


Solution 1: Manually copy the keys of the dict into a set, and perform all intersection operations on that set. However, still wastes lots of memory. 

Solution 2 (Recommended): Port the intersection logic from the set class to the dict view class.

----------
components: Library (Lib)
messages: 270836
nosy: David Su2
priority: normal
severity: normal
status: open
title: dict viewkeys intersection slow for large dicts
type: performance
versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue27575>
_______________________________________


More information about the New-bugs-announce mailing list