[Tutor] Parsing a multi-line/record text file

Marc marc at marcd.org
Sun Nov 11 06:01:50 CET 2012


Hello,

I am trying to parse a text file with a structure that looks like:

[record: Some text about the record]
	Attribute 1 = Attribute 1 text
	Attribute 3 = Attribute 3 text
	Attribute 4 = Attribute 4 text
	Attribute 7 = Attribute 7 text

[record: Some text about the record]
	Attribute 1 = Attribute 1 text
	Attribute 2 = Attribute 2 text
	Attribute 3 = Attribute 3 text
	Attribute 4 = Attribute 4 text
	Attribute 5 = Attribute 5 text
	Attribute 6 = Attribute 6 text

[record: Some text about the record]
	Attribute 2 = Attribute 2 text
	Attribute 3 = Attribute 3 text
	Attribute 7 = Attribute 7 text
	Attribute 8 = Attribute 8 text

Etc.for many hundreds of records

I am looking to create output that looks like:

Attribute 1 text | Attribute 3 text
Attribute 1 text | Attribute 3 text
Blank                      | Attribute 3 text

Treating each record as a record with its associated lines is the holy grail
for which I am searching, yet I seem to only be coming up with dead parrots.
It should be simple, but the answer is eluding me and Google has not been
helpful.

Pathetic thing is that I do this with Python and XML all the time, but I
can't seem to figure out a simple text file.  I 'm missing something simple,
I'm sure.  Here's the most I have gotten to work (poorly) so far - it gets
me the correct data, but not in the correct format because the file is being
handled sequentially, not by record - it's not even close, but I thought I'd
include it here:

     for line in infile:
          while line != '\n':
               Attribute1 = 'Blank'
               Attribute3 = 'Blank'
               line = line.lstrip('\t')
               line = line.rstrip('\n')
               LineElements = line.split('=')
                if LineElements[0] == 'Attribute1 ':
	    Attribute1=LineElements[1]
                if LineElements[0] == 'Attribute3 ':
                    Attribute3=LineElements[1]
               print("%s | %s\n" % (Attribute1, Attribute3))

Is there a library or example I could be looking at for this?  I use lxml
for xml, but I don't think it will work for this - at least the way I tried
did not.

Thank you,
Marc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20121111/10a5e1b4/attachment.html>


More information about the Tutor mailing list