Algorithm help per favore
staschuk at telusplanet.net
Wed Jun 18 18:41:23 CEST 2003
> I need to take a list (probably 20k to 40k elements) of numbers and
> remove consecutive duplicates. Non consecutive duplicates are ok.
> Example: [6,3,3,3,5,7,6,3,4,4,3] => [6,3,5,7,6,3,4,3]
> The 3 and 6 can appear more than once in the result set because
> they're separated by another value. Obviously this is trivial to
> accomplish by walking thru the list and building a new one (or yanking
> elements out of the existing one) but I'm curious if anyone knows of a
> more clever way, with speed being a goal more than memory usage.
If speed is the goal, removing elements from the existing list is
not indicated; that would take O(n**2) time, since each removal
has to shift all subsequent elements down one spot.
Building a new list seems not only like the obvious approach, but
probably the fastest. I think it quite unlikely that there's any
faster clever algorithm.
result = 
append = result.append # avoid attribute lookup in loop
next = iter(iterable).next # ditto
previous = next()
value = next()
if value != previous:
previous = value
(If latency is important, this could be changed into a generator
easily; that would also decrease memory usage. I'm not sure what
it would do to throughput -- try it and see.)
I don't see any way to use built-ins or whatnot to move the loop
into C. Of course, you could recode the whole thing in C.
Steven Taschuk "[W]e must be very careful when we give advice
staschuk at telusplanet.net to younger people: sometimes they follow it!"
-- "The Humble Programmer", Edsger Dijkstra
More information about the Python-list