Python regex question

Gerhard Häring gh at ghaering.de
Wed Jun 11 13:38:07 CEST 2008


Tim van der Leeuw wrote:
> Hi,
> 
> I'm trying to create a regular expression for matching some particular 
> XML strings. I want to extract the contents of a particular XML tag, 
> only if it follows one tag, but not follows another tag. Complicating 
> this, is that there can be any number of other tags in between. [...]

Sounds like this would be easier to implement using Python's SAX API.

Here's a short example that does something similar to what you want to 
achieve:

import xml.sax

test_str = """
<xml>
<ignore/>
<foo x="1" y="2"/>
<noignore/>
<foo x="3" y="4"/>
</xml>
"""

class MyHandler(xml.sax.handler.ContentHandler):
     def __init__(self):
         xml.sax.handler.ContentHandler.__init__(self)
         self.ignore_next = False

     def startElement(self, name, attrs):
         if name == "ignore":
             self.ignore_next = True
             return
         elif name == "foo":
             if not self.ignore_next:
                 # handle the element you're interested in here
                 print "MY ELEMENT", name, "with", dict(attrs)

         self.ignore_next = False

xml.sax.parseString(test_str, MyHandler())

In this case, this looks much clearer and easier to understand to me 
than regular expressions.

-- Gerhard




More information about the Python-list mailing list