[Tutor] List Dup-Elim Method?

Curtis Larsen curtis.larsen@Covance.Com
Mon, 05 Feb 2001 13:15:23 -0600


Tim - 

Thanks!  This helps a lot.  I've used the dictionary method before
(with great success), but the when you have lists within lists
(sometimes within lists) it takes more time to set it up the dictionary
stuff than it would to do something like this.  Thanks again!

Curtis

>>> "Tim Peters" <tim.one@home.com> 02/03/2001 3:43:14 AM >>>
If the elements of a list are hashable, the method using a dict will be
by
far the fastest if the list is large.

If the elements are not hashable (for example, a list of lists), but
can be
sorted, still much quicker than brute force (yet still much slower
than
using a temporary dict!) is this:

def dupfree(x):
    """Remove all duplicates from list x, in place.

    The elements of x must enjoy a total ordering.
    """

    n = len(x)
    if n > 1:
        x.sort()
        last = x[0]
        i = avail = 1
        while i < n:
            if x[i] != last:
                x[avail] = last = x[i]
                avail += 1
            i += 1
        del x[avail:]

That requires Python 2.0, because of the "+=" thingies.

Example:

>>> x = [[1, 2], [3, 4]]
>>> x *= 10
>>> x
[[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4],
 [1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4],
 [1, 2], [3, 4], [1, 2], [3, 4]]
>>> dupfree(x)
>>> x
[[1, 2], [3, 4]]
>>>

How does it work?  Sorting the list brings all the equal elements next
to
each other.  (Note:  Many sorting routines don't work very quickly
when
there are lots of equal elements, but Python's does.  I know that
because I
wrote Python's list.sort() <wink>.)

After all the equal elements are adjacent, it just marches across the
list
once, keeping track of when the list element at index i *changes*. 
When it
does, it moves that not-seen-before element toward the front of the
list,
and moves on skipping over the chunk of elements equal to *it*. 
Finally, it
gets rid of the list positions no longer needed (the "del" stmt).

Of course there's nothing to do if the list doesn't have at least two
elements to start, so it checks for that first.  It *has* to avoid
going
into the main body if the list is empty, because then "last = x[0]"
would
blow up.  Since it has to check for that anyway, it doesn't cost any
more to
make sure there are at least two things in the list.

This isn't an easy algorithm -- the steps are delicate.  If you can use
the
dict method instead, do so!

what-you-can-assume-constrains-what-you-can-do-ly y'rs  - tim


_______________________________________________
Tutor maillist  -  Tutor@python.org 
http://mail.python.org/mailman/listinfo/tutor


-----------------------------------------------------
Confidentiality Notice: This e-mail transmission 
may contain confidential or legally privileged 
information that is intended only for the individual 
or entity named in the e-mail address. If you are not 
the intended recipient, you are hereby notified that 
any disclosure, copying, distribution, or reliance 
upon the contents of this e-mail is strictly prohibited. 

If you have received this e-mail transmission in error, 
please reply to the sender, so that we can arrange 
for proper delivery, and then please delete the message 
from your inbox. Thank you.