[issue1022] dict.update significantly slower than series of dict.__setitem__
New submission from Xavier Morel <bugs.pypy.org@masklinn.net>: In CPython, as the number of keys grow dict.update settles at roughly half the speed of an equivalent bunch of dict[key]=value sequence: > python2.7 -m timeit -s 'd = {}' 'd["foo"] = 3' 'd["bar"] = 4' 'd["baz"] = 5' 'd["qux"] = 6' 'd["quux"] = 7' 1000000 loops, best of 3: 0.395 usec per loop > python2.7 -m timeit -s 'd = {}' 'd.update(foo=3, bar=4, baz=5, qux=6, quux=7)' 1000000 loops, best of 3: 0.657 usec per loop In Pypy, on the other hand, the difference in speed is on the order of 20x, even while setting 5 keys and dict.update is ~30% the speed of the equivalent call in CPython at almost 2µsec/loop to CPython's 0.66µsec/loop, while pypy's __setitem__ is more than 4 times faster than CPython: > pypy-c -m timeit -s 'd = {}' 'd["foo"] = 3' 'd["bar"] = 4' 'd["baz"] = 5' 'd["qux"] = 6' 'd["quux"] = 7' 10000000 loops, best of 3: 0.0877 usec per loop > pypy-c -m timeit -s 'd = {}' 'd.update(foo=3, bar=4, baz=5, qux=6, quux=7)' 100000 loops, best of 3: 1.95 usec per loop that kind-of bothers me, as I tend to use dict.update over long sequences of setitem for readability purpose. Now of course half a µsec per call is not a big difference in absolute, but still… Versions information: * OSX 10.6.8 * CPython 2.7.2 (Macports) * Pypy 1.7.0 Python 2.7.1 (?, Nov 24 2011, 10:57:50) (Macports) ---------- messages: 3814 nosy: masklinn, pypy-issue priority: performance bug status: unread title: dict.update significantly slower than series of dict.__setitem__ ________________________________________ PyPy bug tracker <tracker@bugs.pypy.org> <https://bugs.pypy.org/issue1022> ________________________________________
Fijal <fijall@gmail.com> added the comment: How about we move dict.update to applevel? ---------- nosy: +fijal status: unread -> chatting ________________________________________ PyPy bug tracker <tracker@bugs.pypy.org> <https://bugs.pypy.org/issue1022> ________________________________________
Armin Rigo <armin.rigo@gmail.com> added the comment: It probably doesn't help: the issue seems to be the slowness of "**kwds" argument passing. If you're thinking about app-level, then it's code like that: def update(d, **kwds): for key, value in kwds.items(): # or iteritems() d[key] = value But that would be much slower than a series of direct setitems, and most probably also slower than the RPython equivalent that we have now. To fix this we need to think about how to improve __args__.parse_obj() in a way that lets it enumerate the keywords without actually building a w_kwds dictionary. ---------- nosy: +arigo ________________________________________ PyPy bug tracker <tracker@bugs.pypy.org> <https://bugs.pypy.org/issue1022> ________________________________________
Carl Friedrich Bolz <cfbolz@gmx.de> added the comment: by now our **args handling is much better, so the .update version is only 4x slower than individual setitems. that seems reasonable to me, so I am closing for now. If somebody really needs this a lot faster something could be unrolled, please reopen the issue in that case. ---------- nosy: +cfbolz status: chatting -> resolved ________________________________________ PyPy bug tracker <tracker@bugs.pypy.org> <https://bugs.pypy.org/issue1022> ________________________________________
participants (4)
-
Armin Rigo
-
Carl Friedrich Bolz
-
Fijal
-
Xavier Morel