[Tutor] list comprehension, testing for multiple conditions
Peter Otten
__peter__ at web.de
Wed Aug 22 09:06:11 CEST 2012
Pete O'Connell wrote:
> Hi I am trying to parse a text file and create a list of all the lines
> that don't include: "vn", "vt" or are empty. I want to make this as
> fast as possible because I will be parsing many files each containing
> thousands of lines. I though I would give list comprehensions a try.
> The last 3 lines of the code below have three list comprehensions that
> I would like to combine into 1 but I am not sure how to do that.
> Any tips would be greatly appreciated
>
> pete
>
> #start############################################################
> fileName = '/usr/home/poconnell/Desktop/objCube.obj'
> theFileOpened = open(fileName,'r')
> theTextAsList = theFileOpened.readlines()
If you have a file with 1,000,000 lines you have now a list of 1,000,000
strings of which perhaps 1,000 match your criteria. You are squandering
memory. Rule of thumb: never use readlines(), iterate over the file
directly.
> theTextAsListStripped = []
> for aLine in theTextAsList:
>
> theTextAsListStripped.append(aLine.strip("\n"))
>
> theTextAsListNoVn = [x for x in theTextAsListStripped if "vn" not in x]
> theTextAsListNoVnOrVt = [x for x in theTextAsListNoVn if "vt" not in x]
> theTextAsListNoVnOrVtOrEmptyLine = [x for x in theTextAsListNoVn if x !=
> ""]
I think that should be
theTextAsListNoVnOrVtOrEmptyLine = [x for x in theTextAsListNoVnOrVt if x !=
""]
You can combine the three if clauses or add them all to one list-comp:
with open(filename) as lines:
wanted = [line.strip("\n") for line in lines
if "vn" not in line and "vt" not in line and line != "\n"]
You can even have multiple if clauses in one list-comp (but that is rarely
used):
with open(filename) as lines:
wanted = [line.strip("\n") for line
if "vn" not in line
if "vt" not in x
if line != "\n"]
While your problem is simple enough to combine all filters into one list-
comp some problems are not. You can then prevent the intermediate lists from
materializing by using generator expressions. The result minimizes memory
consumption, too, and should be (almost) as fast. For example:
with open(filename) as lines:
# use gen-exps to remove empty and whitespace-only lines
stripped = (line.strip() for line in lines)
nonempty = (line for line in stripped if line)
wanted = [line for line in nonempty
if "vt" not in line and "vn" not in line]
More information about the Tutor
mailing list