SAX/Python : read an xml from the end to the top
peter at engcorp.com
Tue Mar 7 13:25:21 CET 2006
> The input xml I am parsing is always well formed. It is coming out from
> another application that append to this xml. I didn't see the source
> code of the application, but i know that it is not re-writing the whole
> xml. I thinnk it is just removing the last root element, adding the new
> tags and writing again the </root> tag.
If the writers had a clue, they probably just seek to the end of the
file minus len('</root>') (or whatever) and then overwrite with the new
entry and another </root> element. At least, that's what seemed like
the obvious approach when I had to do this once.
Not that this is particularly relevant to the problem. ;-)
> I guess, i will parse it till I find the last reported event and update
> the output xml from there, reporting only the events I am interested
> in....I hope SAX won't take too much time to do all this...(let's say 1
> event = 10 tags, 5 events/minutes, xml file running for 1 month -->
> 5400 000 opening tags)...
> What do you think?
I think (guessing wildly) you probably have a fairly restricted number
of possibilities being written to this file, possibly as simple as the
somewhat stereotypical '<entry text="blah blah"/>' type of thing which
I've seen lots of times.
If so, you can simply treat this as a text file which you process
manually, in whatever direct and crude fashion works best, such as by
seeking 1000 chars back from the end (assuming new entries are always
less than that length), scanning for the last "<entry" string, and
slicing and dicing till you find the stuff you need.
In other words, screw SAX, just grab the data directly and forget about
all those silly well-formed XML issues etc. Go for the simplest thing
that could possibly work, and if you don't need the complexity of SAX,
don't use it.
More information about the Python-list