Matching XML Tag Contents with Regex

Chris chrisspen at gmail.com
Tue Dec 11 13:04:52 EST 2007


On Dec 11, 11:41 am, garage <xmikeda... at gmail.com> wrote:
> > Is what I'm trying to do possible with Python's Regex library? Is
> > there an error in my Regex?
>
> Search for '*?' onhttp://docs.python.org/lib/re-syntax.html.
>
> To get around the greedy single match, you can add a question mark
> after the asterisk in the 'content' portion the the markup.  This
> causes it to take the shortest match, instead of the longest. eg
>
> <%(tagName)s\s[^>]*>[.\n\r\w\s\d\D\S\W]*?[^(%(tagName)s)]*
>
> There's still some funkiness in the regex and logic, but this gives
> you the three matches

Thanks, that's pretty close to what I was looking for. How would I
filter out tags that don't have certain text in the contents? I'm
running into the same issue again. For instance, if I use the regex:

<%(tagName)s\s[^>]*>[.\n\r\w\s\d\D\S\W]*?(targettext)+[^(%
(tagName)s)]*

each match will include "targettext". However, some matches will still
include </%(tagName)s)>, presumably from the tags which didn't contain
targettext.




More information about the Python-list mailing list