[Tutor] a question about list
Kent Johnson
kent37 at tds.net
Fri Nov 12 04:19:12 CET 2004
Max Noel wrote:
>
> On Nov 11, 2004, at 23:17, Terry Carroll wrote:
> a = [element for index, element in enumerate(a) if index not in b]
>
> Looks a bit more Pythonic to me. I also think it's faster.
Regular readers of this list know I can't resist an optimization
challenge :-) Also I was suspicious of Max's claim, as this method is
clearly O(len(a) * len(b)) at least - for each element of a, the list b
is searched for the index.
Terry Carroll proposed:
> b_rev = b
Note: if you want to copy b, use b_rev = b[:]
> b_rev.sort() # b = [1, 2, 4] ; sorted just to be sure
> b_rev.reverse() # b = [4, 2, 1]
> for i in b_rev:
> del a[i]
At first glance this seems more promising, as it skips the searching
through b. But the del a[i] step requires copying all the elements of a
after location i, so it is O(len(a)) and this method also seems to be
O(len(a) * len(b)) - that is, the time taken will be proportional to the
length of a times the length of b.
Liam Clarke proposed
> for i in b:
> a[i]="XX" <-- this makes a=['a','XX','XX','d','XX','f','g']
>
> while a.count("XX"): While the number of occurences of 'XX' in a are above zero, do this...
> a.remove("XX") Remove the first occurrence of 'XX' found
This doesn't strike me as too promising because removing the 'XX' values
requires _two_ searches through a for _each_ removal. But the removal
could be done with a list comprehension which would make it work with a
single pass over a:
a = [ element for element in a if element != 'XX' ]
Of course this requires that 'XX' does not appear in a, which could be a
problem. A way around that is to make a unique sentinel and use it instead:
class Sentinel:
pass
s = Sentinel()
This algorithm appears to be O(len(a) + len(b)) which could be a
significant advantage for large a and b.
So, those are the three contenders. Here is a program to test them. It
removes every fourth element from a list of 10000 integers:
##################
import random, timeit
a = range(10000)
b = range(0, 10000, 4)
random.shuffle(b) # Make the sort do some work
def remove1(a, b):
b = b[:]
b.sort()
b.reverse()
for i in b:
del a[i]
def remove2(a, b):
a[:] = [ element for i, element in enumerate(a) if i not in b ]
def remove3(a, b):
class Sentinel: pass
s = Sentinel()
for i in b:
a[i] = s
a[:] = [elem for elem in a if elem is not s]
def timeOne(fn):
setup = "from __main__ import " + fn.__name__ + ',a,b'
stmt = fn.__name__ + '(a[:], b[:])'
t = timeit.Timer(stmt, setup)
secs = min(t.repeat(3, 1))
print '%s: %f secs' % (fn.__name__, secs)
# Make sure they all give the same answer
a1=a[:]
b1 = b[:]
remove1(a1, b1)
a2=a[:]
b2 = b[:]
remove2(a2, b2)
a3=a[:]
b3 = b[:]
remove3(a3, b3)
assert a1 == a2
assert a3 == a2
timeOne(remove1)
timeOne(remove2)
timeOne(remove3)
################
Here are my results:
remove1: 0.023205 secs
remove2: 1.262509 secs
remove3: 0.004119 secs
So with large a and b, the algorithm with the sentinel is a clear
winner. The enumerate algorithm is far slower than the one with del
a[i]; my guess this is because the del does a block move of memory,
which can be very fast, while the enumerate version has to actually search.
What about for small a and b? Here are my results with
a = range(100)
b = range(0, 100, 4)
remove1: 0.000032 secs
remove2: 0.000181 secs
remove3: 0.000052 secs
Now the del version wins; I think the very fast built-in list operations
take the prize here.
Of course with small a and b, these are all very fast and the enumerate
version has the advantage of being concise and readable.
Kent
More information about the Tutor
mailing list