Tim Peters on speed!

Tim Peters tim.one at home.com
Sun Nov 11 19:50:22 CET 2001

[Don Tuttle]
> Ok, now that I have your attention <wink>...

Hmm -- I confess it worked.  Good ploy!  I hope to see a lot more of these

> In studing Tim' recipe "Remove duplicates from a sequence"
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560
> Tim assigns the result back to the original list rather than a new list.
> Was this done for performance or convenients?  Is there a performance
> penalty for using a second list?
>     ...
>     try:
>           t = list(s)
>           t.sort()
>     except TypeError:
>           del t  # move on to the next method
>     else:
>           assert n > 0
>           last = t[0]
>           lasti = i = 1
>           while i < n:
>               if t[i] != last:
>                   t[lasti] = last = t[i]
>                   lasti += 1
>               i += 1
>           return t[:lasti]

It's primarily to save memory:  the "t = list(s)" already allocated all the
memory we could possibly need for the result, and the result *can* be
computed in place, so why allocate more?  Storing into a preallocated list
is definitely quicker than .append()'ing elements, one at a time, to an
initially empty list, if that's the alternative you have in mind.  A
comparatively minor consideration is that crawling over two lists is also
less cache-friendly than crawling over one.

Note that in no case does the unique() function actually overwrite *the*
original list:  the input sequence is s, not t, and s may not even be a list
(before Python 2.2, it can be any sequence; in Python 2.2, any iterable
object).  In case you were confused about that, note too that list(s) makes
a (shallow) copy of s when s is a list:  nothing in unique() mutates s
(although it's possible that a perverse user-defined __hash__ or __cmp__
method could mutate s as a side-effect of hashing or comparing s's

not-responsible-for-insane-user-methods-ly y'rs - tim

More information about the Python-list mailing list