[Tutor] Parsing a multi-line/record text file
Marc
marc at marcd.org
Sun Nov 11 06:01:50 CET 2012
Hello,
I am trying to parse a text file with a structure that looks like:
[record: Some text about the record]
Attribute 1 = Attribute 1 text
Attribute 3 = Attribute 3 text
Attribute 4 = Attribute 4 text
Attribute 7 = Attribute 7 text
[record: Some text about the record]
Attribute 1 = Attribute 1 text
Attribute 2 = Attribute 2 text
Attribute 3 = Attribute 3 text
Attribute 4 = Attribute 4 text
Attribute 5 = Attribute 5 text
Attribute 6 = Attribute 6 text
[record: Some text about the record]
Attribute 2 = Attribute 2 text
Attribute 3 = Attribute 3 text
Attribute 7 = Attribute 7 text
Attribute 8 = Attribute 8 text
Etc.for many hundreds of records
I am looking to create output that looks like:
Attribute 1 text | Attribute 3 text
Attribute 1 text | Attribute 3 text
Blank | Attribute 3 text
Treating each record as a record with its associated lines is the holy grail
for which I am searching, yet I seem to only be coming up with dead parrots.
It should be simple, but the answer is eluding me and Google has not been
helpful.
Pathetic thing is that I do this with Python and XML all the time, but I
can't seem to figure out a simple text file. I 'm missing something simple,
I'm sure. Here's the most I have gotten to work (poorly) so far - it gets
me the correct data, but not in the correct format because the file is being
handled sequentially, not by record - it's not even close, but I thought I'd
include it here:
for line in infile:
while line != '\n':
Attribute1 = 'Blank'
Attribute3 = 'Blank'
line = line.lstrip('\t')
line = line.rstrip('\n')
LineElements = line.split('=')
if LineElements[0] == 'Attribute1 ':
Attribute1=LineElements[1]
if LineElements[0] == 'Attribute3 ':
Attribute3=LineElements[1]
print("%s | %s\n" % (Attribute1, Attribute3))
Is there a library or example I could be looking at for this? I use lxml
for xml, but I don't think it will work for this - at least the way I tried
did not.
Thank you,
Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20121111/10a5e1b4/attachment.html>
More information about the Tutor
mailing list