[IronPython] More Performance comparisons - dictionary updates and tuples

Dino Viehland dinov at exchange.microsoft.com
Thu Apr 17 01:52:33 CEST 2008


I'll have to look at these closer but I suspect fixing dict update performance will be a trivial change.  We currently just go through some overly generic code path which works for any IDictionary.  Unfortunately when used against a PythonDictionary we're going through a code path which is making a copy of all the members into a list, getting an enumerator for that list, and then copying the members into the new dictionary.  Not exactly the fastest thing in the world and likely a regression caused directly from the new dictionary implementation...

The tuple hashing is better in Beta 1 then it was before (due to your feedback! :)) but it's still not perfect.  I ran into a test case in the standard library the other day which is verifying the # of collisions is below 15 or something like that and we were still getting thousands of collisions in that test.  So that still deserves another iteration.

The regressions w/ ints and tuple assignment certainly deserve a good investigation as well - hopefully it's just something silly that got broken. :)

I'll try and get to these tomorrow but if I don't I'll open a bug just so it doesn't get lost.

Are there any other issues that you or anyone else would like to see for 1.1.2?  We've currently only got 2 other issues up on CodePlex marked for 1.1.2 right now.

-----Original Message-----
From: users-bounces at lists.ironpython.com [mailto:users-bounces at lists.ironpython.com] On Behalf Of Michael Foord
Sent: Wednesday, April 16, 2008 10:33 AM
To: Discussion of IronPython
Subject: [IronPython] More Performance comparisons - dictionary updates and tuples

Hello guys,

I've been looking at performance in Resolver One. (Object creation in
IronPython seems to be really good whereas dictionary lookups not so
good when we compare against CPython.)

It turns out that we are getting bitten quite badly by the performance
of hashing tuples (fixing this for IP 1.1.2 would be *great*). I did
some profiling and have some comparisons - and can also show a
regression in performance in IP 2 (Beta 1) when using ints as dictionary
keys in an update operation. I thought I would post the results as they
may be useful.

(Dictionary update in IP 1 with tuple keys is an order of magnitude
slower than CPython. So is IP 2 but still twice as good as IP 1 - two
times *worse* than IP 1 for tuple creating and unpacking though.)

Results first:

CPython
e:\Dev>timeit1.py
tuple_create_and_unpack took 220.999956131 ms
dict_update took 541.000127792 ms


IP 1.1.1
e:\Dev>e:\Dev\ironpython1\ipy.exe timeit1.py
tuple_create_and_unpack took 680.9792 ms
dict_update took 7891.3472 ms


IP 2 Beta 1
e:\Dev>e:\Dev\ironpython2\ipy.exe timeit1.py
tuple_create_and_unpack took 1341.9296 ms
dict_update took 4756.84 ms


If we switch to using integers rather than tuples for the dictionary
keys, the performance changes:

CPython
e:\Dev>timeit1.py
tuple_create_and_unpack took 200.000047684 ms
dict_update took 230.999946594 ms


IP 1.1.1
e:\Dev>e:\Dev\ironpython1\ipy.exe timeit1.py
tuple_create_and_unpack took 911.3104 ms
dict_update took 420.6048 ms


IP 2 Beta 1
e:\Dev>e:\Dev\ironpython2\ipy.exe timeit1.py
tuple_create_and_unpack took 971.3968 ms
dict_update took 1582.2752 ms


With ints as keys, IP 1 is only half the speed of CPython - but IP 2 is
four times slower than IP 1!

The code used - which runs under both CPython and IronPython


from random import random

try:
    import clr
    from System import DateTime

    def timeit(func):
        start = DateTime.Now
        func()
        end = DateTime.Now
        print func.__name__, 'took %s ms' % (end - start).TotalMilliseconds

except ImportError:
    import time

    def timeit(func):
        start = time.time()
        func()
        end = time.time()
        print func.__name__, 'took %s ms' %  ((end - start) * 1000)


def tuple_create_and_unpack():
    for val in range(1000000):
        a, b = val, val + 1

d1 = {}
for x in range(100):
    for y in range(100):
        d1[x, y] = random()

d2 = {}
for x in range(1000):
    for y in range(1000):
        d2[x, y] = random()

def dict_update():
    d1.update(d2)

timeit(tuple_create_and_unpack)
timeit(dict_update)


Michael Foord
http://www.ironpythoninaction.com/
_______________________________________________
Users mailing list
Users at lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com



More information about the Ironpython-users mailing list