regular expression: perl ==> python

Nick Craig-Wood nick at craig-wood.com
Thu Dec 23 01:48:52 EST 2004


Fredrik Lundh <fredrik at pythonware.com> wrote:
>  that's not a very efficient way to match multiple patterns, though.  a
>  much better way is to combine the patterns into a single one, and use
>  the "lastindex" attribute to figure out which one that matched.

lastindex is useful, yes.

> see
> 
>      http://effbot.org/zone/xml-scanner.htm
> 
>  for more on this topic.

I take your point. However I don't find the below very readable -
making 5 small regexps into 1 big one, plus a game of count the
brackets doesn't strike me as a huge win...

xml = re.compile(r"""
    <([/?!]?\w+)     # 1. tags
    |&(\#?\w+);      # 2. entities
    |([^<>&'\"=\s]+) # 3. text strings (no special characters)
    |(\s+)           # 4. whitespace
    |(.)             # 5. special characters
    """, re.VERBOSE)

Its probably faster though, so I give in gracelessly ;-)

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list