[Patches] [Patch #102733] {}.popitem() implementation

noreply@sourceforge.net noreply@sourceforge.net
Tue, 12 Dec 2000 14:03:10 -0800


Patch #102733 has been updated. 

Project: python
Category: core (C code)
Status: Closed
Submitted by: gvanrossum
Assigned to : gvanrossum
Summary: {}.popitem() implementation

Follow-Ups:

Date: 2000-Dec-12 14:03
By: gvanrossum

Comment:
OK, checked in.

-------------------------------------------------------

Date: 2000-Dec-10 15:12
By: tim_one

Comment:
+1.  I think .popitem() is a good addition, providing a (now predictably) efficient way to do something useful that couldn't be done efficiently before, and with no bad effects on time or space use elsewhere.  I don't see any real point to additional .popkey() or .popvalue() methods.

The patch is missing:

+ Docs.
+ Test cases.
+ docstrings (although all of dictobject.c is missing docstrings).

I would really like to understand why the original suggestion did good for you.  For starters, which platform did you run it on?  I don't see anything in the code that should vary across 32-bit platforms (not the string hash codes, or the GF-cycling business, ... even computing the initial value of incr is careful to cast hash to unsigned before the right shift).

On the platform where it did help, what does this print (w/ the current patch -- this will reveal the internal storage order for a and b)?

N = 128
ints = range(N)
a = {}
for i in ints:
    a[`i`] = i
b = a.copy()
aorder = [a.popitem()[1] for i in ints]
border = [b.popitem()[1] for i in ints]
assert not a
assert not b
for aval, bval in zip(aorder, border):
    print "%6d %6d" % (aval, bval),
    if aval == bval:
        print "ok"
    else:
        print "******* out of synch *******"

Here's what I get:

    34     34 ok
    35     35 ok
    36     36 ok
    37     37 ok
    30     30 ok
    31     31 ok
    32     32 ok
    33     33 ok
    38     95 ******* out of synch *******
    39     38 ******* out of synch *******
    78     39 ******* out of synch *******
    79     78 ******* out of synch *******
   107     79 ******* out of synch *******
   106    107 ******* out of synch *******
   101    106 ******* out of synch *******
   100    101 ******* out of synch *******
   103    100 ******* out of synch *******
   102    103 ******* out of synch *******
    70    102 ******* out of synch *******
    71     70 ******* out of synch *******
    72     71 ******* out of synch *******
    73     72 ******* out of synch *******
    74     73 ******* out of synch *******
    75     74 ******* out of synch *******
    76     75 ******* out of synch *******
    77     76 ******* out of synch *******
    59     77 ******* out of synch *******
    54    108 ******* out of synch *******
    55      9 ******* out of synch *******
    58      7 ******* out of synch *******
     9      5 ******* out of synch *******
     7      3 ******* out of synch *******
     5      1 ******* out of synch *******
     3     41 ******* out of synch *******
     1     40 ******* out of synch *******
    41     43 ******* out of synch *******
    40     42 ******* out of synch *******
    43     45 ******* out of synch *******
    42     44 ******* out of synch *******
    45     47 ******* out of synch *******
    44     46 ******* out of synch *******
    47     49 ******* out of synch *******
    46     48 ******* out of synch *******
    49     85 ******* out of synch *******
    48     84 ******* out of synch *******
    85     87 ******* out of synch *******
    84     86 ******* out of synch *******
    87     81 ******* out of synch *******
    86     80 ******* out of synch *******
    81     83 ******* out of synch *******
    80     82 ******* out of synch *******
    83    112 ******* out of synch *******
    82    113 ******* out of synch *******
   112    110 ******* out of synch *******
   113    111 ******* out of synch *******
   110     89 ******* out of synch *******
   111     88 ******* out of synch *******
    89    114 ******* out of synch *******
    88    115 ******* out of synch *******
   114     99 ******* out of synch *******
   115    124 ******* out of synch *******
   122     52 ******* out of synch *******
   120     53 ******* out of synch *******
   117     50 ******* out of synch *******
    52     51 ******* out of synch *******
    53     56 ******* out of synch *******
    50     57 ******* out of synch *******
    51     54 ******* out of synch *******
    56     55 ******* out of synch *******
    57     16 ******* out of synch *******
    18     17 ******* out of synch *******
    19     58 ******* out of synch *******
    16     59 ******* out of synch *******
    17     19 ******* out of synch *******
    14     13 ******* out of synch *******
    15     18 ******* out of synch *******
    12     11 ******* out of synch *******
    13     96 ******* out of synch *******
    10     97 ******* out of synch *******
    11     94 ******* out of synch *******
    96     14 ******* out of synch *******
    97     92 ******* out of synch *******
    94     93 ******* out of synch *******
    95     90 ******* out of synch *******
    92     91 ******* out of synch *******
    93    127 ******* out of synch *******
    90    126 ******* out of synch *******
    91    125 ******* out of synch *******
   127     15 ******* out of synch *******
   126    123 ******* out of synch *******
   125    122 ******* out of synch *******
   124     98 ******* out of synch *******
   123    120 ******* out of synch *******
   116    118 ******* out of synch *******
    98    119 ******* out of synch *******
    99     10 ******* out of synch *******
   118     12 ******* out of synch *******
   119     69 ******* out of synch *******
   104      8 ******* out of synch *******
   109      6 ******* out of synch *******
     8      4 ******* out of synch *******
     6      2 ******* out of synch *******
     4      0 ******* out of synch *******
     2     61 ******* out of synch *******
     0     29 ******* out of synch *******
    29     28 ******* out of synch *******
    28     62 ******* out of synch *******
    23     23 ok
    22     22 ok
    21     21 ok
    20     20 ok
    27     27 ok
    26     26 ok
    25     25 ok
    24     24 ok
   105    105 ok
    69    104 ******* out of synch *******
    68     68 ok
    67     67 ok
    66     66 ok
    65     65 ok
    64     64 ok
    63     63 ok
    62    117 ******* out of synch *******
    61    116 ******* out of synch *******
    60     60 ok
   108    109 ******* out of synch *******
   121    121 ok

-------------------------------------------------------

Date: 2000-Dec-10 19:23
By: tim_one

Comment:
OK, looks good.  Changed status to Accepted and assigned back to you for checkin.
-------------------------------------------------------

Date: 2000-Dec-10 19:07
By: gvanrossum

Comment:
Here's a new version. Unchanged Objects/dictobject.c patch; added Lib/UserDict.py, Lib/test/test_types.py, Doc/lib/libstdtypes.tex.

I'll check it in after a 24 hour wait period.
-------------------------------------------------------

Date: 2000-Dec-10 19:03
By: tim_one

Comment:
Good!  Not all is lost:  random blobs of the string and dict code got a thorough code review out of it <wink>.

BTW, if you're going to Pronounce favorably on .popitem(), feel free to tell me to finish the job (just an offer, not a request -- either way is fine by me).
-------------------------------------------------------

Date: 2000-Dec-10 18:44
By: gvanrossum

Comment:
I get the exact same output as you, Tim.

I tried to reproduce my original results with the original implementation again, and indeed I get the same times with or without your "improvement".

My best guess as to why I thought it made a difference is that I misread the poorly designed output from the test program.

I'll add docs and test cases.

-------------------------------------------------------

Date: 2000-Dec-09 14:01
By: gvanrossum

Comment:
New patch uploaded.  This is really Tim Peters's patch, including the correction he posted on Sat, 9 Dec 2000 16:09:30.

I don't know why Tim's original suggested "improvement" worked so well for me -- it really did improve a lot over the original version.  Oh well.
-------------------------------------------------------

Date: 2000-Dec-08 10:28
By: gvanrossum

Comment:
Tim (or anyone else): can you improve this?

It has excellent performance as long as you only do a single dictionary at a time, but goes quadratic if you call popitem() for two identical dictionaries in lock-step.

Maybe jumping criss-cross through the hash table like lookdict does would improve that; but I don't understand the math used for that ("Cycle through GF(2^n)-{0}" ???).

Here's a test program:

import time

for run in range(1000):
    print "run =", run
    for log2size in range(10, 18):
        size = 2**log2size
        print "log2size =", log2size,
        print "size =", size
        a = {}
        t0 = time.clock()
        while 1:
            i = len(a)
            if i >= size:
                break
            a[`i`] = i
        t1 = time.clock()
        print "%.1f usec per item to build (total %.3f sec)" % (
            (1e6*(t1-t0)/size), t1-t0)
        b = a.copy()
        t0 = time.clock()
        try:
            while 1:
                a.popitem()
                b.popitem()
        except KeyError:
            pass
        t1 = time.clock()
        print "%.1f usec per item to destroy twins (total %.3f sec)" % (
            (1e6*(t1-t0)/size), t1-t0)
        assert not a, a
        assert not b, b

-------------------------------------------------------

Date: 2000-Dec-08 10:58
By: tim_one

Comment:
While I haven't yet actually tried it, I'm pretty sure you'd get much better performance (linear) in your test case by changing "finger = i+1" to "finger = i", and no matter how many dicts you march over in lockstep.  In the case of a single dict, it adds a trivial amount of overhead per lookup (one extra C-speed trip around the loop per lookup).

The reason "GF(2^n)-{)}" confuses you is because he should have written "**" instead of "^" <wink>.  In essense, it means "visit each of the 2**n bit patterns exactly once, except for the all-zero pattern", and the "GF" means Galois Field, the theoretical basis for why the bit manipulations actually achieve that.  I don't believe it would help:  the sequence of bit patterns visited remains fixed, and if you continue to move the finger "one beyond" you'll get the same problem (the first pass thru two lockstep iters creates gaps that the second pass has to skip over; the second pass doubles the size of those gaps; the third pass doubles them again, etc).

I agree that sharing one finger is very attractive.  +1.
-------------------------------------------------------

Date: 2000-Dec-08 11:04
By: gvanrossum

Comment:
New patch with Tim's suggestion.  Works well!
-------------------------------------------------------

Date: 2000-Dec-08 14:55
By: tim_one

Comment:
Actually, your test program with the new patch continues to show rotten behavior for me.  dict.copy() returns a dict with the same value, but not necessarily the same internal structure.  This is easy to see by changing the test's inner loop to

while 1:
    gota = a.popitem()
    gotb = b.popitem()
    assert gota == gotb, (gota, gotb, len(a), len(b))

This fails instantly for me:

run = 0
log2size = 10 size = 1024
10.2 usec per item to build (total 0.010 sec)
Traceback (most recent call last):
  File "tdict.py", line 27, in ?
    assert gota == gotb, (gota, gotb, len(a), len(b))
AssertionError: (('773', 773), ('479', 479), 1023, 1023)

If I make the dicts truly identical, by building b via the same operations in the same order as occurred for a, then the assert doesn't trigger and performance is wonderful (deleting the dicts is faster than building them).  But in that case, the original "finger = i+1" is also very good (just a little slower).  My head analysis of that case was wrong:  the first pass does create a checkerboard pattern in both dicts, but the second pass systematically hits the holes created by the first pass.

Are you sure your test program worked a lot better after the patch?  Best I can tell, what I'm seeing has very little to do with the finger strategy and very much to do with accidents in the way dict.copy() works.
-------------------------------------------------------

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=102733&group_id=5470