something Evil happens when large hashes destroyed

guuge guuge at localhost.localhost
Sun Nov 18 11:01:48 EST 2001


I was trying to sort about 100,000 items by splitting them into
groups (using a python dictionary type) and then successively splitting
these groups until they were small enough to use a brute force method.

My test program, which used only two or three keys to create the first
split, worked fine.  When I tried the real thing, using 256 keys, the
program slowed to a crawl.  The python interpreter took forever to
destroy the dictionaries.

Here's a little program that will place X number of elements into
Y number of bins and gives the time used to do the placement and 
the time used to destroy the dictionary.

-----------------------------------------------------
#!/usr/bin/env python
import os, sys, string, time

class Element:
    def __init__ (self, val):
        self.val = val

def usage ():
    print "%s <number of keys in hash> <number of elemets>" \
           % os.path.basename(sys.argv[0])
    sys.exit (1)

def main ():
    if len(sys.argv) < 3:
        usage ()
    numKeys = string.atoi (sys.argv[1])
    numElements = string.atoi (sys.argv[2])
    bigHash = {}
    createStartTime = time.time ()
    for i in range(numElements):
        key = i % numKeys
        e = Element (i)
        if bigHash.has_key(key):
            bigHash[key].append (e)
        else:
            bigHash[key] = [e]
    print "created %d elements and placed into %d bins in %d seconds" % \
          (numElements, numKeys, time.time() - createStartTime)
    destroyStartTime = time.time ()
    return destroyStartTime

if __name__ == "__main__":
    destroyStartTime = main ()
    print "hash destroyed in %d seconds" % (time.time() - destroyStartTime)

-----------------------------------------------------

Here is some timing data for 100,000 elements on a pentium 166 MMX 
with 140 MB of ram.  Time is in seconds.

First column is the number of keys to use.
Second column is the time used to place the elements in the dictionary.
The third column is the time used to destroy the dictionary.


python 1.5.2:
1          34   1
2          36   2
3          37   2
4          38   3
5          37   2
6          38   122             <---- over 2 minutes to destroy a Dict !!!
7          38   189
8          36   126
9          37   110

python 2.0.1:
1          50   2
2          51   2
3          53   2
4          55   3
5          52   156             <----     :(
6          23   100
7          51   218
8          52   289
9          53   177

python 2.2b2:
1          34   2
2          36   2
3          35   2
4          36   319             <----  Ouch !  Five minutes!!!
5          37   209
6          36   147
7          36   117
8          36   91
9          34   62




More information about the Python-list mailing list