Strange regex problem

Gary Herron gherron at islandtraining.com
Sun Mar 2 14:11:34 EST 2003


On Saturday 01 March 2003 02:11 pm, Dagur Páll Ammendrup wrote:
> Hi,
>
> I have this code:
>
>      p = re.compile('<!--START:(\w*?)-->')
>      m = p.search(file)
>      while m:
>          begin = m.end()
>          match = m.group(1)
>          n = re.search('<!--END:%s-->' % match,file,begin)
>          locations.append((match,begin,n.start()))
>          m = p.search(file,begin)
>
> It works and everything but when I run it I get a long list of


The full answer has come out in pieces here, but I'll put it all
together.  The problem is a mistaken use of a module function
(re.search) and a collision with an undocumented flag SRE_FLAG_DEBUG.

As noted by others, the third parameter of re.search is not a search
position, but a collection of flag bits. So this code, using "begin"
as that argument can potentially turn on or off any of those bits
depending on the bit pattern of its value.

The documented flags are re.IGNORECASE, re.LOCALE, re.MULTILINE,
re.DOTALL, re.UNICODE, and re.VERBOSE, but the code recognizes two
undocumented flags, one of which is called SRE_FLAG_DEBUG (having value
128).

So when the value of "begin" has that particular bit set, set
SRE_FLAG_DEBUG on and so get the debug output.


I have just (last week) volunteered to take over maintenance of the
regular expression code, so I'll think about fixing this, but it's not
clear to me what a fix should entail.

 * Document the flag.

 * Ignore the flag.

 * Raise an exception for any flag bit other than the document flags.

 * Others?

Gary Herron






More information about the Python-list mailing list