[Patches] [Patch #102733] {}.popitem() implementation

noreply@sourceforge.net noreply@sourceforge.net
Fri, 8 Dec 2000 10:58:28 -0800


Patch #102733 has been updated. 

Project: python
Category: core (C code)
Status: Open
Submitted by: gvanrossum
Assigned to : tim_one
Summary: {}.popitem() implementation

Follow-Ups:

Date: 2000-Dec-08 10:28
By: gvanrossum

Comment:
Tim (or anyone else): can you improve this?

It has excellent performance as long as you only do a single dictionary at a time, but goes quadratic if you call popitem() for two identical dictionaries in lock-step.

Maybe jumping criss-cross through the hash table like lookdict does would improve that; but I don't understand the math used for that ("Cycle through GF(2^n)-{0}" ???).

Here's a test program:

import time

for run in range(1000):
    print "run =", run
    for log2size in range(10, 18):
        size = 2**log2size
        print "log2size =", log2size,
        print "size =", size
        a = {}
        t0 = time.clock()
        while 1:
            i = len(a)
            if i >= size:
                break
            a[`i`] = i
        t1 = time.clock()
        print "%.1f usec per item to build (total %.3f sec)" % (
            (1e6*(t1-t0)/size), t1-t0)
        b = a.copy()
        t0 = time.clock()
        try:
            while 1:
                a.popitem()
                b.popitem()
        except KeyError:
            pass
        t1 = time.clock()
        print "%.1f usec per item to destroy twins (total %.3f sec)" % (
            (1e6*(t1-t0)/size), t1-t0)
        assert not a, a
        assert not b, b

-------------------------------------------------------

Date: 2000-Dec-08 10:58
By: tim_one

Comment:
While I haven't yet actually tried it, I'm pretty sure you'd get much better performance (linear) in your test case by changing "finger = i+1" to "finger = i", and no matter how many dicts you march over in lockstep.  In the case of a single dict, it adds a trivial amount of overhead per lookup (one extra C-speed trip around the loop per lookup).

The reason "GF(2^n)-{)}" confuses you is because he should have written "**" instead of "^" <wink>.  In essense, it means "visit each of the 2**n bit patterns exactly once, except for the all-zero pattern", and the "GF" means Galois Field, the theoretical basis for why the bit manipulations actually achieve that.  I don't believe it would help:  the sequence of bit patterns visited remains fixed, and if you continue to move the finger "one beyond" you'll get the same problem (the first pass thru two lockstep iters creates gaps that the second pass has to skip over; the second pass doubles the size of those gaps; the third pass doubles them again, etc).

I agree that sharing one finger is very attractive.  +1.
-------------------------------------------------------

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=102733&group_id=5470