Algorithm help per favore

Steven Taschuk staschuk at telusplanet.net
Wed Jun 18 18:41:23 CEST 2003


Quoth Larry:
> I need to take a list (probably 20k to 40k elements) of numbers and
> remove consecutive duplicates. Non consecutive duplicates are ok.
> 
> Example: [6,3,3,3,5,7,6,3,4,4,3] => [6,3,5,7,6,3,4,3]
> 
> The 3 and 6 can appear more than once in the result set because
> they're separated by another value. Obviously this is trivial to
> accomplish by walking thru the list and building a new one (or yanking
> elements out of the existing one) but I'm curious if anyone knows of a
> more clever way, with speed being a goal more than memory usage.

If speed is the goal, removing elements from the existing list is
not indicated; that would take O(n**2) time, since each removal
has to shift all subsequent elements down one spot.

Building a new list seems not only like the obvious approach, but
probably the fastest.  I think it quite unlikely that there's any
faster clever algorithm.

    def unique(iterable):
        result = []
        append = result.append       # avoid attribute lookup in loop
        next = iter(iterable).next   # ditto
        try:
            previous = next()
            append(previous)
            while True:
                value = next()
                if value != previous:
                    append(value)
                    previous = value
        except StopIteration:
            pass
        return result

(If latency is important, this could be changed into a generator
easily; that would also decrease memory usage.  I'm not sure what
it would do to throughput -- try it and see.)

I don't see any way to use built-ins or whatnot to move the loop
into C.  Of course, you could recode the whole thing in C.

-- 
Steven Taschuk             "[W]e must be very careful when we give advice
staschuk at telusplanet.net    to younger people: sometimes they follow it!"
                             -- "The Humble Programmer", Edsger Dijkstra





More information about the Python-list mailing list