[XML-SIG] Help parsing XML
Uche Ogbuji
Uche.Ogbuji at fourthought.com
Thu Mar 31 19:22:38 CEST 2005
On Tue, 2005-03-29 at 15:33 -0600, Greg Lindstrom wrote:
> Hello-
> I have a general (I guess) xml parsing question that I hope has an
> answer. I am busy parsing health care claim records using xpath and do
> not see a way to parse the following (stripped down) file (I've added
> lines to group my problem...)
>
> 1. + <seg id='ST'>
> 2. + <loop id='HEADER'>
> 3. - <loop id='DETAIL'>
> 4. - <loop id='2000A'>
> 5. + <seg id='HL'>
> 6. + <loop id='2000AA'>
> 7. + <loop id='2000B'>
> 8. + <seg id='HL'> --------+
> 9. + <seg id='SBR'> |
> 10. + <loop id='2010BA'> | Group 1
> 11. + <loop id='2010BB'> |
> 12. + <loop id='2300'> -----+
> 13. + <seg id='HL'> ---------+
> 14. + <seg id='SBR'> |
> 15. + <loop id='2010BA'> |
> 16. + <loop id='2010BB'> | Group 2
> 17. + <loop id='2300'> -----+
> 18. </loop>
> 19. </loop>
> 20. </loop>
>
> What I need to do is process the records from lines 8-12 as a group,
> then the records from lines 13-17 as another group. Each of the "HL"
> segments indicates the beginning of a new set of records to process. I
> would think that the xml should (would/could) be defined so that each of
> the HL statements would start a new loop structure, but that's not how
> it's defined and I can't change it. There is no way of knowing how many
> lines will be in each set of records, or how many HL segments will be
> beneath the 2000B loop, so is there a way I can logically group the
> record segments together to form a packet of record to process?
>
> Thanks for any attention/help you can pass my way.
You've heard a lot of suggestions, and they're all good, but I couldn't
help posting a neat Amara recipe for such grouping:
-- % --
from amara import binderytools
XML="""\
<doc>
<!-- Each a element is an implicit group extending to the next a -->
<a id="1"/>
<b id="1.1"/>
<c id="1.2"/>
<a id="2"/>
<b id="2.1"/>
<c id="2.2"/>
<a id="3"/>
<b id="3.1"/>
<c id="3.2"/>
</doc>
"""
top = binderytools.create_document(u"doc")
container = None
for e in binderytools.pushbind('/doc/*', string=XML):
if e.nodeName == u"a":
container = e
top.doc.xml_append(e)
else:
container.xml_append(e)
print top.xml(indent=u"yes")
-- % --
The output is:
<?xml version="1.0" encoding="UTF-8"?>
<doc>
<a id="1">
<b id="1.1"/>
<c id="1.2"/>
</a>
<a id="2">
<b id="2.1"/>
<c id="2.2"/>
</a>
<a id="3">
<b id="3.1"/>
<c id="3.2"/>
</a>
</doc>
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
Writing and Reading XML with XIST - http://www.xml.com/pub/a/2005/03/16/py-xml.html
Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
State of the art in XML modeling - http://www.ibm.com/developerworks/xml/library/x-think30.html
More information about the XML-SIG
mailing list