Simple text parsing gets difficult when line continues to next line

Tim Hochberg tim.hochberg at ieee.org
Tue Nov 28 15:55:54 EST 2006


John Machin wrote:
> Jacob Rael wrote:
>> Hello,
>>
>> I have a simple script to parse a text file (a visual basic program)
>> and convert key parts to tcl. Since I am only working on specific
>> sections and I need it quick, I decided not to learn/try a full blown
>> parsing module. My simple script works well until it runs into
>> functions that straddle multiple lines. For example:
>>
>>   Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
>> &H8, &H0, _
>>                 &H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
>> &H0, &HF, &H0, -1)
>>
>>
>> I read in each line with:
>>
>> for line in open(fileName).readlines():
>>
>> I would line to identify if a line continues (if line.endswith('_'))
>> and concate with the next line:
>>
>> line = line + nextLine
>>
>> How can I get the next line when I am in a for loop using readlines?
> 
> Don't do that. I'm rather dubious about approaches that try to grab the
> next line on the fly e.g. fp.next(). Here's a function that takes a
> list of lines and returns another with all trailing whitespace removed
> and the continued lines glued together. It uses a simple state machine
> approach.

I agree that mixing the line assembly and parsing is probably a mistake 
although using next explicitly is fine as long as your careful with it. 
For instance, I would be wary to use the mixed for-loop, next strategy 
that some of the previous posts suggested. Here's a different, 
generator-based implementation of the same idea that, for better or for 
worse is considerably less verbose:

def continue_join_2(linesin):
     getline = iter(linesin).next
     while True:
         buffer = getline().rstrip()
         try:
             while buffer.endswith('_'):
                 buffer = buffer[:-1] + getline().rstrip()
         except StopIteration:
             raise ValueError("last line is continued: %r" % line)
         yield buffer

-tim

[SNIP]




More information about the Python-list mailing list