xml.sax parsing elements with the same name
Stefan Behnel
stefan_ml at behnel.de
Tue Jan 12 03:13:35 EST 2010
amadain, 11.01.2010 20:13:
> I have an event log with 100s of thousands of entries with logs of the
> form:
>
> <event eventTimestamp="2009-12-18T08:22:49.035"
> uniqueId="1261124569.35725_PFS_1_1340035961">
> <result value="Blocked"/>
> <filters>
> <filter code="338" type="Filter_Name">
> <diagnostic>
> <result value="Triggered"/>
> </diagnostic>
> </filter>
> <filter code="338" type="Filter_Name">
> <diagnostic>
> <result value="Blocked"/>
> </diagnostic>
> </filter>
> </filters>
> </event>
>
> I am using xml.sax to parse the event log.
You should give ElementTree's iterparse() a try (xml.etree package).
Instead of a stream of simple events, it will give you a stream of
subtrees, which are a lot easier to work with. You can intercept the event
stream on each 'event' tag, handle it completely in one obvious code step,
and then delete any content you are done with to safe memory.
It's also very fast, you will like not loose much performance compared to
xml.sax.
Stefan
More information about the Python-list
mailing list