why are these exec times so different?

Skip Montanaro skip at pobox.com
Wed Dec 17 16:11:39 EST 2003


    res> To show that what I'm seeing isn't because of local/global
    res> differences, here are dummy functions that exhibit the same
    res> behavior.  (And I've tried to minimize os background activity, etc,
    res> when testing.)

    res> The results are consistently repeatable: For the given string s,
    res> f1(s) takes >10 times longer to run than f2(s) -- f1(s) and f2(s)
    res> return the same value, and contain essentially the same code,
    res> except that f2 calls function g to do the common inner loop:

    ...

On my machine, f2 runs about twice as fast as f1.  The only obvious
difference I can see is that in f1 D grows to length 491678, while in g the
dictionary D is always much smaller than that.  The resizing cost of f1.D
may outweigh the creation costs of all the smaller g.D instances.

Here's a slightly modified version of your script which prints the
distribution of g.D sizes (modulo 10) at the end of f2.  Those which are
very small won't need to be resized at all.  A dictionary of length 490000
will have to be resized quite a bit.  Also, the larger dictionary (depending
on how the key hashing works) may incur a much larger number of collisions
than the many smaller dictionaries, further decreasing performance.

Skip

def f1(s):
    """ do outer & inner loop """
    D = {}
    for i in xrange(len(s)):
        for j in xrange(len(s)-i+1):
            D[s[j:j+i]] = True
    return len(D)

def f2(s):
    """ same as f1, but call g to do inner loop """
    count = 0
    dsizes = {}
    for i in xrange(len(s)):
        c = g(s,i)
        n = dsizes.get(c%10, 0) + 1
        dsizes[c%10] = n
        count += c
    print dsizes
    return count

def g(s,i):
    """ the common inner loop """
    D = {}
    for j in xrange(len(s)-i+1):
        D[s[j:j+i]] = True
    return len(D)

def string01(n):
    """Return first n chars of a fixed pseudorandom string."""
    random.seed(0)
    return ''.join([ random.choice('01') for i in xrange(n) ])

import random, time
s = string01(1000)
for func in (f1, f2):
    t0 = time.clock()
    count = func(s)
    t1 = time.clock()
    print func.func_name, count, t1-t0





More information about the Python-list mailing list