Processing XML File

Adam Tauno Williams awilliam at opengroupware.us
Fri Jan 29 13:54:26 EST 2010


On Fri, 2010-01-29 at 10:34 -0800, jakecjacobson wrote:
> On Jan 29, 1:04 pm, Adam Tauno Williams <awill... at opengroupware.us>
> wrote:
> > On Fri, 2010-01-29 at 09:25 -0800, jakecjacobson wrote:
> > > I need to take a XML web resource and split it up into smaller XML
> > > files.  I am able to retrieve the web resource but I can't find any
> > > good XML examples.  I am just learning Python so forgive me if this
> > > question has been answered many times in the past.
> > > My resource is like:
> > > <document>
> > >      ...
> > >      ...
> > > </document>
> > > <document>
> > > </document>
> > > So in this example, I would need to output 2 files with the contents
> > > of each file what is between the open and close document tag.
> > Do you want to parse the document or SaX?
> > I have a SaX example at
> > <http://coils.hg.sourceforge.net/hgweb/coils/coils/file/99b227b08f7f/s...>
> Thanks but I am way over my head with XML, Python.  I am working with
> DDMS and need to output the individual resource nodes to their own
> file.  I hope that this helps and I need a good example and how to use
> it.


If that is all you need XPath will spit it apart for you like
<http://coils.hg.sourceforge.net/hgweb/coils/coils/file/99b227b08f7f/src/coils/logic/workflow/actions/xml/xpath.py>


doc = etree.parse(self._rfile)
results = doc.xpath(xpath)
for result in results:
  print str(result)

For example if your XML has an outermost element of ResultSet with inner row elements just do:
for record in doc.xpath(u'/ResultSet/row')

Implied import for these examples is "from lxml import etree"


> Here is what a resource node looks like:
>       <ddms:Resource
>         xsi:schemaLocation="https://metadata.dod.mil/mdr/ns/DDMS/1.4/
> https://metadata.dod.mil/mdr/ns/DDMS/1.4/"
>         xmlns:ddms="https://metadata.dod.mil/mdr/ns/DDMS/1.4/"
>         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>         xmlns:ICISM="urn:us:gov:ic:ism:v2">
>         <ddms:identifier ddms:qualifier="URL" ddms:value="https://
> metadata.dod.mil/mdr/ns/TBD/1.0/SampleTaxonomy.owl"/>
>         <ddms:identifier ddms:qualifier="https://metadata.dod.mil/mdr/
> ns/MDR/1.0/MDR.owl#GovernanceNamespace" ddms:value="TBD"/>
>         <ddms:identifier ddms:qualifier="Version" ddms:value="1.0"/>
>         <ddms:title ICISM:ownerProducer="USA"
> ICISM:classification="U">Sample Taxonomy</ddms:title>
>         <ddms:description ICISM:ownerProducer="USA"
> ICISM:classification="U">
>           This is a sample taxonomy created for the Help page.
>         </ddms:description>
>         <ddms:dates ddms:posted="2007-11-24"/>
>         <ddms:creator ICISM:ownerProducer="USA"
> ICISM:classification="U">
>           <ddms:Person>
>             <ddms:name>Sample</ddms:name>
>             <ddms:surname>Developer</ddms:surname>
>             <ddms:affiliation>FGM, Inc.</ddms:affiliation>
>             <ddms:phone>703-885-1000</ddms:phone>
>             <ddms:email>sampleDeveloper at fgm.com</ddms:email>
>           </ddms:Person>
>         </ddms:creator>
>         <ddms:security ICISM:ownerProducer="USA"
> ICISM:classification="U" ICISM:nonICmarkings="DIST_STMT_A" />
>         <!-- Other DDMS elements may appear here. -->
>       </ddms:Resource>
> 
> You can see the DDMS site at https://metadata.dod.mil/.


-- 
OpenGroupware developer: awilliam at whitemice.org
<http://whitemiceconsulting.blogspot.com/>
OpenGroupare & Cyrus IMAPd documenation @
<http://docs.opengroupware.org/Members/whitemice/wmogag/file_view>




More information about the Python-list mailing list