[Tutor] list comprehension, testing for multiple conditions

Peter Otten __peter__ at web.de
Wed Aug 22 09:06:11 CEST 2012


Pete O'Connell wrote:

> Hi I am trying to parse a text file and create a list of all the lines
> that don't include: "vn", "vt" or are empty. I want to make this as
> fast as possible because I will be parsing many files each containing
> thousands of lines. I though I would give list comprehensions a try.
> The last 3 lines of the code below have three list comprehensions that
> I would like to combine into 1 but I am not sure how to do that.
> Any tips would be greatly appreciated
> 
> pete
> 
> #start############################################################
> fileName = '/usr/home/poconnell/Desktop/objCube.obj'
> theFileOpened = open(fileName,'r')
> theTextAsList = theFileOpened.readlines()

If you have a file with 1,000,000 lines you have now a list of 1,000,000
strings of which perhaps 1,000 match your criteria. You are squandering 
memory. Rule of thumb: never use readlines(), iterate over the file 
directly.

> theTextAsListStripped = []
> for aLine in theTextAsList:
> 
>     theTextAsListStripped.append(aLine.strip("\n"))
> 
> theTextAsListNoVn = [x for x in theTextAsListStripped if "vn" not in x]
> theTextAsListNoVnOrVt = [x for x in theTextAsListNoVn if "vt" not in x]
> theTextAsListNoVnOrVtOrEmptyLine = [x for x in theTextAsListNoVn if x !=
> ""]

I think that should be

theTextAsListNoVnOrVtOrEmptyLine = [x for x in theTextAsListNoVnOrVt if x != 
""]

You can combine the three if clauses or add them all to one list-comp:

with open(filename) as lines:
    wanted = [line.strip("\n") for line in lines
                  if "vn" not in line and "vt" not in line and line != "\n"]


You can even have multiple if clauses in one list-comp (but that is rarely 
used):

with open(filename) as lines:
    wanted = [line.strip("\n") for line 
                  if "vn" not in line
                  if "vt" not in x 
                  if line != "\n"]

While your problem is simple enough to combine all filters into one list-
comp some problems are not. You can then prevent the intermediate lists from 
materializing by using generator expressions. The result minimizes memory 
consumption, too, and should be (almost) as fast. For example:

with open(filename) as lines:
    # use gen-exps to remove empty and whitespace-only lines
    stripped = (line.strip() for line in lines)
    nonempty = (line for line in stripped if line)

    wanted = [line for line in nonempty 
                  if "vt" not in line and "vn" not in line]




More information about the Tutor mailing list