Processing XML File
Sells, Fred
fred.sells at adventistcare.org
Fri Jan 29 14:31:56 EST 2010
Google is your friend. Elementtree is one of the better documented
IMHO, but there are many modules to do this.
> -----Original Message-----
> From: python-list-bounces+frsells=adventistcare.org at python.org
> [mailto:python-list-bounces+frsells=adventistcare.org at python.org] On
> Behalf Of Stefan Behnel
> Sent: Friday, January 29, 2010 2:25 PM
> To: python-list at python.org
> Subject: Re: Processing XML File
>
> jakecjacobson, 29.01.2010 18:25:
> > I need to take a XML web resource and split it up into smaller XML
> > files. I am able to retrieve the web resource but I can't find any
> > good XML examples. I am just learning Python so forgive me if this
> > question has been answered many times in the past.
> >
> > My resource is like:
> >
> > <document>
> > ...
> > ...
> > </document>
> > <document>
> > ...
> > ...
> > </document>
>
> Is this what you get as a document or is this just /contained/ in the
> document?
>
> Note that XML does not allow more than one root element, so the above
is
> not XML. Each of the two <document>...</document> parts form an XML
> document by themselves, though.
>
>
> > So in this example, I would need to output 2 files with the contents
> > of each file what is between the open and close document tag.
>
> Are the two files formatted as you show above? In that case, you can
> simply
> iterate over the lines and cut the document when you see "<document>".
Or,
> if you are sure that "<document>" only appears as top-most elements
and
> not
> inside of the documents, you can search for "<document>" in the
content (a
> string, I guess) and split it there.
>
> As was pointed out before, once you have these two documents, use the
> xml.etree package to work with them.
>
> Something like this might work:
>
> import xml.etree.ElementTree as ET
>
> data = urllib2.urlopen(url).read()
>
> for part in data.split('<document>'):
> document = ET.fromstring('<document>'+part)
> print(document.tag)
> # ... do other stuff
>
> Stefan
> --
> http://mail.python.org/mailman/listinfo/python-list
More information about the Python-list
mailing list