Parsing xml file using python

Tony Meyer t-meyer at ihug.co.nz
Sat Mar 6 23:16:03 EST 2004


> I need to read an XML document and ignore all XML tags and 
> write only those between the tags to a text file.  In other 
> words, if I have an XML document like so:
> 
> <tag1>This</tag1>
>     <tag2>is</tag2>
>        <tag3>a</tag3>
> <tag1>test</tag1>
> 
> I need to write "This is a test" to a text file.  How do I achieve
> this?

Apart from the XML parsing solutions that have already been posted, if you
really don't care about the tags, then you don't actually need to parse the
XML.  Just dump anything between < and > (assumes that the XML is valid, and
so doesn't have < or > except for as a tag).

For example (this is pretty basic):

>>> import re
>>> s =
"<tag1>This</tag1>\n<tag2>is</tag2>\n<tag3>a</tag3>\n<tag1>test</tag1>"
>>> re.sub(r"<.*?>", "", s)
'This\nis\na\ntest'

You'd still have to deal with entities, but IIRC, you'll have to with most
parsers, too.  It's certainly simpler.

=Tony Meyer





More information about the Python-list mailing list