Help parsing a text file

Thomas Jollans t at jollybox.de
Mon Aug 29 17:05:23 EDT 2011


On 29/08/11 20:21, William Gill wrote:
> I haven't done much with Python for a couple years, bouncing around
> between other languages and scripts as needs suggest, so I have some
> minor difficulty keeping Python functionality Python functionality in my
> head, but I can overcome that as the cobwebs clear.  Though I do seem to
> keep tripping over the same Py2 -> Py3 syntax changes (old habits die
> hard).
> 
> I have a text file with XML like records that I need to parse.  By XML
> like I mean records have proper opening and closing tags. but fields
> don't have closing tags (they rely on line ends).  Not all fields appear
> in all records, but they do adhere to a defined sequence.
> 
> My initial passes into Python have been very unfocused (a scatter gun of
> too many possible directions, yielding very messy results), so I'm
> asking for some suggestions, or algorithms (possibly even examples)that
> may help me focus.
> 
> I'm not asking anyone to write my code, just to nudge me toward a more
> disciplined approach to a common task, and I promise to put in the
> effort to understand the underlying fundamentals.

A name that is often thrown around on this list for this kind of
question is pyparsing. Now, I don't know anything about it myself, but
it may be worth looking into.

Otherwise, if you say it's similar to XML, you might want to take a cue
from XML processing when it comes to dealing with the file. You could
emulate the stream-based approach taken by SAX or eXpat - have methods
that handle the different events that can occur - for XML this is "start
tag", "end tag", "text node", "processing instruction", etc., in your
case, it might be "start/end record", "field data", etc. That way, you
could separate the code that keeps track of the current record, and how
the data fits together to make an object structure, and the parsing
code, that knows how to convert a line of data into something meaningful.

Thomas



More information about the Python-list mailing list