Parsing a file with iterators
Marc 'BlackJack' Rintsch
bj_666 at gmx.net
Fri Oct 17 12:45:42 EDT 2008
On Fri, 17 Oct 2008 11:42:05 -0400, Luis Zarrabeitia wrote:
> I need to parse a file, text file. The format is something like that:
>
> TYPE1 metadata
> data line 1
> data line 2
> ...
> data line N
> TYPE2 metadata
> data line 1
> ...
> TYPE3 metadata
> ...
> […]
> because when the parser iterates over the input, it can't know that it
> finished processing the section until it reads the next "TYPE" line
> (actually, until it reads the first line that it cannot parse, which if
> everything went well, should be the 'TYPE'), but once it reads it, it is
> no longer available to the outer loop. I wouldn't like to leak the
> internals of the parsers to the outside.
>
> What could I do?
> (to the curious: the format is a dialect of the E00 used in GIS)
Group the lines before processing and feed each group to the right parser:
import sys
from itertools import groupby, imap
from operator import itemgetter
def parse_a(metadata, lines):
print 'parser a', metadata
for line in lines:
print 'a', line
def parse_b(metadata, lines):
print 'parser b', metadata
for line in lines:
print 'b', line
def parse_c(metadata, lines):
print 'parser c', metadata
for line in lines:
print 'c', line
def test_for_type(line):
return line.startswith('TYPE')
def parse(lines):
def tag():
type_line = None
for line in lines:
if test_for_type(line):
type_line = line
else:
yield (type_line, line)
type2parser = {'TYPE1': parse_a,
'TYPE2': parse_b,
'TYPE3': parse_c }
for type_line, group in groupby(tag(), itemgetter(0)):
type_id, metadata = type_line.split(' ', 1)
type2parser[type_id](metadata, imap(itemgetter(1), group))
def main():
parse(sys.stdin)
More information about the Python-list
mailing list