[XML-SIG] Help parsing XML

Thu Mar 31 19:22:38 CEST 2005

On Tue, 2005-03-29 at 15:33 -0600, Greg Lindstrom wrote:
> Hello-
> I have a general (I guess) xml parsing question that I hope has an 
> answer.  I am busy parsing health care claim records using xpath and do 
> not see a way to parse the following (stripped down) file (I've added 
> lines to group my problem...)
> 
> 1.  + <seg id='ST'>
> 2.  + <loop id='HEADER'>
> 3.  - <loop id='DETAIL'>
> 4.    - <loop id='2000A'>
> 5.      + <seg id='HL'>
> 6.      + <loop id='2000AA'>
> 7.      + <loop id='2000B'>
> 8.        + <seg id='HL'>   --------+
> 9.        + <seg id='SBR'>          |
> 10.       + <loop id='2010BA'>      | Group 1
> 11.       + <loop id='2010BB'>      |
> 12.       + <loop id='2300'>   -----+
> 13.       + <seg id='HL'>  ---------+
> 14.       + <seg id='SBR'>          |
> 15.       + <loop id='2010BA'>      |
> 16.       + <loop id='2010BB'>      | Group 2
> 17.       + <loop id='2300'>   -----+
> 18.               </loop>
> 19.           </loop>
> 20.       </loop>
> 
> What I need to do is process the records from lines 8-12 as a group, 
> then the records from lines 13-17 as another group.  Each of the "HL" 
> segments indicates the beginning of a new set of records to process.  I 
> would think that the xml should (would/could) be defined so that each of 
> the HL statements would start a new loop structure, but that's not how 
> it's defined and I can't change it.  There is no way of knowing how many 
> lines will be in each set of records, or how many HL segments will be 
> beneath the 2000B loop, so is there a way I can logically group the 
> record segments together to form a packet of record to process?
> 
> Thanks for any attention/help you can pass my way.

You've heard a lot of suggestions, and they're all good, but I couldn't
help posting a neat Amara recipe for such grouping:

-- % --

from amara import binderytools

XML="""\
<doc>
  <!-- Each a element is an implicit group extending to the next a -->
  <a id="1"/>
  <b id="1.1"/>
  <c id="1.2"/>

  <a id="2"/>
  <b id="2.1"/>
  <c id="2.2"/>

  <a id="3"/>
  <b id="3.1"/>
  <c id="3.2"/>
</doc>
"""

top = binderytools.create_document(u"doc")

container = None
for e in binderytools.pushbind('/doc/*', string=XML):
    if e.nodeName == u"a":
        container = e
        top.doc.xml_append(e)
    else:
        container.xml_append(e)

print top.xml(indent=u"yes")

-- % --

The output is:

<?xml version="1.0" encoding="UTF-8"?>
<doc>
  <a id="1">
    <b id="1.1"/>
    <c id="1.2"/>
  </a>
  <a id="2">
    <b id="2.1"/>
    <c id="2.2"/>
  </a>
  <a id="3">
    <b id="3.1"/>
    <c id="3.2"/>
  </a>
</doc>

-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
Writing and Reading XML with XIST - http://www.xml.com/pub/a/2005/03/16/py-xml.html
Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
State of the art in XML modeling - http://www.ibm.com/developerworks/xml/library/x-think30.html