Iterate over text file, discarding some lines via context manager
Akira Li
4kir4.1i at gmail.com
Fri Nov 28 16:33:55 EST 2014
Ned Batchelder <ned at nedbatchelder.com> writes:
> On 11/28/14 10:22 AM, Dave Angel wrote:
>> On 11/28/2014 10:04 AM, fetchinson . wrote:
>>> Hi all,
>>>
>>> I have a feeling that I should solve this by a context manager but
>>> since I've never used them I'm not sure what the optimal (in the
>>> python sense) solution is. So basically what I do all the time is
>>> this:
>>>
>>> for line in open( 'myfile' ):
>>> if not line:
>>> # discard empty lines
>>> continue
>>> if line.startswith( '#' ):
>>> # discard lines starting with #
>>> continue
>>> items = line.split( )
>>> if not items:
>>> # discard lines with only spaces, tabs, etc
>>> continue
>>>
>>> process( items )
>>>
>>> You see I'd like to ignore lines which are empty, start with a #, or
>>> are only white space. How would I write a context manager so that the
>>> above simply becomes
>>>
>>> with some_tricky_stuff( 'myfile' ) as items:
>>> process( items )
>>>
>>
>> I see what you're getting at, but a context manager is the wrong
>> paradigm. What you want is a generator. (untested)
>>
>> def mygenerator(filename):
>> with open(filename) as f:
>> for line in f:
>> if not line: continue
>> if line.startswith('#'): continue
>> items = line.split()
>> if not items: continue
>> yield items
>>
>> Now your caller simply does:
>>
>> for items in mygenerator(filename):
>> process(items)
>>
>>
>
> I think it's slightly better to leave the open outside the generator:
>
> def interesting_lines(f):
> for line in f:
> line = line.strip()
> if line.startswith('#'):
> continue
> if not line:
> continue
> yield line
>
> with open("my_config.ini") as f:
> for line in interesting_lines(f):
> do_something(line)
>
> This makes interesting_lines a pure filter, and doesn't care what sort
> of sequence of strings it's operating on. This makes it easier to
> test, and more flexible. The caller's code is also clearer in my
> opinion.
>
> BTW: this example is taken verbatim from my PyCon presentation on
> iteration, it you are interested:
> http://nedbatchelder.com/text/iter.html
The conditions could be combined in this case:
def iter_rows(lines):
for line in lines:
items = line.split()
if items and not items[0].startswith('#'):
yield items # space-separated non-emtpy non-comment items
with open(filename):
for items in iter_rows(file):
process(items)
--
Akira
More information about the Python-list
mailing list