fastest way to merge lists.

Tim Peters tim_one at email.msn.com
Sun Jan 9 19:51:55 EST 2000


[Tim, replying to RoyS]
> You can save a little typing via e.g.
>
> for s in list1, list2, list3, list4, list5:
>     for x in s:
>         d[x] = 1

[Aahz Maruch]
> In practice, I know that the integer "1" is a cached,
> unique object, but that's an implementation detail that I
> prefer not to rely on.

You could force this issue by doing e.g. "one = 1" before the loop nest and
referring to "one" within it, but, as you know, that doesn't make any
difference today (& very unlikely it ever will -- not only is "1" (along
with some other small integers) globally unique today, but within a code
block all numeric literals of any kind (strings too, BTW) are optimized to
uniqueness within the block).  So Guido would have to lose his mind twice
over for this to change.

For a surprising reason given below, you may actually want to use this
assignment trick if you use None.

> I therefore almost always would use "d[x] = None"; that's
> particularly true if I *might* want to attach data to it
> later, in addition to making it clear that I'm interested
> only in the dictionary keys.

Fine by me!  Guido usually uses None too.  I used 1 partly because Roy did,
but I confess it's a habit I carried in from other languages.  For example,
before Icon grew a distinct "set" type, the idiomatic way to do sets was to
use Icon's flavor of dict with a default value of 0, and map the items in
the set to 1.

Brief digression:  when you create an Icon dict, you can specify a default
value D, which is any old object.  Then an Icon dict lookup works like the
Python d.get(lookitup, D).  You get D on key absence even if you would
rather have gotten an error, so it's a dubious feature, but it can be
convenient (indeed, that's why Python's dicts grew a .get() method!).

Anyway, the idea of representing set membership by a "true" value and
absence by a "false" one got so deeply ingrained, that even now when I see
"d[key] = None", I mechanically think "OK, so they're taking key *out* of
the set now"(!).  This is why Python should be the only language people are
allowed to learn <wink>.

Curiously, Roy has an actual reason to use 1 (or something else other than
None), because he's specifically looking for speed.  "1" is fetched up by
index from the local constant vector, but "None" is a builtin (not a
keyword) so first suffers a failing lookup in the global dict before a
successful lookup in the builtin dict.  So each dynamic occurrence of

    d[whatever] = None

actually triggers 3(!) dict lookups (the obvious one, plus two hidden ones
to find the None object).

The effect varies across platforms, but can be significant.  Run the
attached to see what it's like on yours (on my P5-166, using 1 is a bit over
25% faster *overall* than using None, via the usual speedup = (slower -
faster) / faster * 100).  Using the "assignment trick" to create a local
name for None eliminates the speed difference.

just-three-of-the-millions-of-fascinating-None-stories<wink>-ly
    y'rs  - tim


def timeit(f, n=10):
    from time import clock
    start = clock()
    for i in range(n): f()
    finish = clock()
    print f.__name__ + ":", round(finish - start, 4)

indices = range(50000)

def timeNone():
    d = {}
    for i in indices:
        d[i] = None

def timeOne():
    d = {}
    for i in indices:
        d[i] = 1

def timeLocalNone(None=None):
    d = {}
    for i in indices:
        d[i] = None

timeit(timeNone)
timeit(timeOne)
timeit(timeLocalNone)






More information about the Python-list mailing list