Quick compare string to list

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Thu Oct 1 00:10:32 CEST 2009


On Wed, 30 Sep 2009 11:36:03 -0700, Scooter wrote:

> I'm reading in a text file, and for each line in the file, I'm looking
> for the existence of phrases from a list. The list contains approx. 120
> items currently but will most likely grow. This procedure itself is not
> the main function of my program and only grew out of the need to
> reformat certain phrases I'm finding in a file before re-outputting it.
> But as I suspected, this searching of the lists slows the whole process
> way way down. Was looking for ideas of a better way to do this.
> 
> I basically have
> 
> mylist=[]
> ...
> code that reads in the flat file into string 'flatfileString' ...
> for listitem in mylist:
>     if flatfileString.count(listitem):
>         ...whatever...I found it.


For starters, why are you bothering to count occurrences of the string if 
you only need a There/Not There answer? That's wasteful... it means the 
code has to walk the entire length of the flatfileString every single 
time. Now, string.count() is likely to be fast because it's written in C, 
but it's not instantaneous. Better is:


for listitem in mylist:
    if listitem in flatfileString:
        process()


That should show a small improvement, but you can probably do better. 
Here's two more simple approaches worth trying, all untested:

# Use a regex.
r = re.compile('|'.join(mylist))  # item 0 or item 1 or ... 
if r.search(flatfileString):
    process()


# Use a loop, re-writing it as a list comprehension for speed.
if any([item in flatfileString for item in mylist]):
    process()


# As above, but a generator expression instead.
if any(item in flatfileString for item in mylist):
    process()



You will probably find that which approach is faster depends on how many 
items are in mylist.

If none of these approaches are fast enough, you may need to look at a 
more complicated approach, such as Bearophile's suggestion.



-- 
Steven



More information about the Python-list mailing list