Speed up this code?

Tim Chase python.list at tim.thechases.com
Fri May 26 03:24:13 CEST 2006


> def rmlist(original, deletions):
>    return [i for i in original if i not in deletions]
> 
> original will be a list of odd numbers and deletions will be numbers
> that are not prime, thus this code will return all items in original
> that are not in deletions. For n > 100,000 or so, the program takes a
> very long time to run, whereas it's fine for numbers up to 10,000.
> 
> Does anybody know a faster way to do this? (finding the difference all
> items in list a that are not in list b)?

Testing for membership in an unsorted list is an O(n) sort of 
operation...the larger the list, the longer it takes.

I presume order doesn't matter, or that the results can be sorted 
after the fact.  If this is the case, it's quite efficient to use 
sets which provide intersection/difference/union methods.  If you 
pass in sets rather than lists, you can simply

	return original.difference(deletions)

It's almost not worth calling a function for :)  There's also an 
in-place version called difference_update().

Once you've found all the results you want, and done all the set 
differences you want, you can just pass the resulting set to a 
list and sort it, if sorted results matter.

-tkc







More information about the Python-list mailing list