Tim Peters on speed!
Tim Peters
tim.one at home.com
Sun Nov 11 13:50:22 EST 2001
[Don Tuttle]
> Ok, now that I have your attention <wink>...
Hmm -- I confess it worked. Good ploy! I hope to see a lot more of these
<wink>.
> In studing Tim' recipe "Remove duplicates from a sequence"
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560
>
> Tim assigns the result back to the original list rather than a new list.
> Was this done for performance or convenients? Is there a performance
> penalty for using a second list?
>
> ...
> try:
> t = list(s)
> t.sort()
> except TypeError:
> del t # move on to the next method
> else:
> assert n > 0
> last = t[0]
> lasti = i = 1
> while i < n:
> if t[i] != last:
> t[lasti] = last = t[i]
> lasti += 1
> i += 1
> return t[:lasti]
It's primarily to save memory: the "t = list(s)" already allocated all the
memory we could possibly need for the result, and the result *can* be
computed in place, so why allocate more? Storing into a preallocated list
is definitely quicker than .append()'ing elements, one at a time, to an
initially empty list, if that's the alternative you have in mind. A
comparatively minor consideration is that crawling over two lists is also
less cache-friendly than crawling over one.
Note that in no case does the unique() function actually overwrite *the*
original list: the input sequence is s, not t, and s may not even be a list
(before Python 2.2, it can be any sequence; in Python 2.2, any iterable
object). In case you were confused about that, note too that list(s) makes
a (shallow) copy of s when s is a list: nothing in unique() mutates s
(although it's possible that a perverse user-defined __hash__ or __cmp__
method could mutate s as a side-effect of hashing or comparing s's
elements).
not-responsible-for-insane-user-methods-ly y'rs - tim
More information about the Python-list
mailing list