[XML-SIG] Help parsing XML
wunder at verity.com
Wed Mar 30 01:12:50 CEST 2005
I'd just use a SAX interface. When you see id=HL as an attribute,
close the old record and start a new one. Do the same thing at
end of file. Done.
Generally, if the structure is fairly fixed and you are extracting
the data, think about using SAX. If the shape of the structure
carries a lot of the information, you might need a DOM.
--On Tuesday, March 29, 2005 03:00:36 PM -0800 Dan Gunter <dkgunter at lbl.gov> wrote:
> I can suggest where to start. You could use XSLT to transform it first,
> ie into <HL> ..stuff.. </HL> sections. The XSLT cookbook (o'reilly)
> recipe 6.8 "Deepening an XML hierarchy", should help. Or you could
> stream the tree through (eg PullDOM or elementtree) and write the
> program logic to transform it in Python (assuming what's between HL tags
> fits into memory, but probably the XSLT approach has the same
> limitation). Hope that helps.
> Greg Lindstrom wrote:
>> I have a general (I guess) xml parsing question that I hope has an
>> answer. I am busy parsing health care claim records using xpath and
>> do not see a way to parse the following (stripped down) file (I've
>> added lines to group my problem...)
>> 1. + <seg id='ST'>
>> 2. + <loop id='HEADER'>
>> 3. - <loop id='DETAIL'>
>> 4. - <loop id='2000A'>
>> 5. + <seg id='HL'>
>> 6. + <loop id='2000AA'>
>> 7. + <loop id='2000B'>
>> 8. + <seg id='HL'> --------+
>> 9. + <seg id='SBR'> |
>> 10. + <loop id='2010BA'> | Group 1
>> 11. + <loop id='2010BB'> |
>> 12. + <loop id='2300'> -----+
>> 13. + <seg id='HL'> ---------+
>> 14. + <seg id='SBR'> |
>> 15. + <loop id='2010BA'> |
>> 16. + <loop id='2010BB'> | Group 2
>> 17. + <loop id='2300'> -----+
>> 18. </loop>
>> 19. </loop>
>> 20. </loop>
>> What I need to do is process the records from lines 8-12 as a group,
>> then the records from lines 13-17 as another group. Each of the "HL"
>> segments indicates the beginning of a new set of records to process.
>> I would think that the xml should (would/could) be defined so that
>> each of the HL statements would start a new loop structure, but that's
>> not how it's defined and I can't change it. There is no way of
>> knowing how many lines will be in each set of records, or how many HL
>> segments will be beneath the 2000B loop, so is there a way I can
>> logically group the record segments together to form a packet of
>> record to process?
>> Thanks for any attention/help you can pass my way.
> XML-SIG maillist - XML-SIG at python.org
More information about the XML-SIG