[Tutor] formatting xml (again)

richard kappler richkappler at gmail.com
Wed Dec 28 10:10:06 EST 2016


It occurred to me last night on the drive home that I should just run this
through an xml parser, then lo and behold this email was sitting in my
inbox when I got  home. Having tried that, my data is not as clean as I
first thought. It seems like a fairly simple fix, but durned if I can
figure out how to do it. One of the problems is data such as this (viewed
in the text editor, this is a log, not a stream):

1\x02 data data data \x03\x02 more data more data more data \x03\x02 even
more data even
2more data even more data\x03\x02 Mary had a little\x03\x02 lamb whose
fleece was white as
3snow\x03\x02

and so on. The 1,2,3 at the beginning of each above are just line numbers
in the text editor, they do not actually exist.

How do I read in the file, either in it's entirety or line by line, then
output the text with as \x02 the event data \x03 on each line, and when
python sees the \x03 it goes to a new line and continues to output?

On Tue, Dec 27, 2016 at 7:46 PM, David Rock <david at graniteweb.com> wrote:

> * Alan Gauld via Tutor <tutor at python.org> [2016-12-28 00:40]:
> > On 27/12/16 19:44, richard kappler wrote:
> > > Using python 2.7 - I have a large log file we recorded of streamed xml
> data
> > > that I now need to feed into another app for stress testing. The
> problem is
> > > the data comes in 2 formats.
> > >
> > > 1. each 'event' is a full set of xml data with opening and closing
> tags +
> > > x02 and x03 (stx and etx)
> > >
> > > 2. some events have all the xml data on one 'line' in the log, others
> are
> > > in typical nested xml format with lots of white space and multiple
> 'lines'
> > > in the log for each event, the first line of th e 'event' starting
> with an
> > > stx and the last line of the 'event' ending in an etx.
> >
> > It sounds as if an xml parser should work for both. After all
> > xml doesn't care about layout and whitespace etc.
> >
> > Which xml parser are you using - I assume you are not trying
> > to parse it manually using regex or string methjods - that's
> > rarely a good idea for xml.
>
> Yeah, since everything appears to be <data>..</data>, the "event" flags
> of [\x02] [\x03] may not even matter if you use an actual parser.
>
> --
> David Rock
> david at graniteweb.com
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>


More information about the Tutor mailing list