Regex anomaly

Roy Smith roy at panix.com
Tue Jan 3 00:15:59 EST 2006


<mike.klaas at gmail.com> wrote:
>
>Hello,
>
>Has anyone has issue with compiled re's vis-a-vis the re.I (ignore
>case) flag?  I can't make sense of this compiled re producing a
>different match when given the flag, odd both in it's difference from
>the uncompiled regex (as I thought the uncompiled api was a wrapper
>around a compile-and-execute block) and it's difference from the
>compiled version with no flag specified.  The match given is utter
>nonsense given the input re.
>
>In [48]: import re
>In [49]: reStr = r"([a-z]+)://"
>In [51]: against = "http://www.hello.com"
>In [53]: re.match(reStr, against).groups()
>Out[53]: ('http',)
>In [54]: re.match(reStr, against, re.I).groups()
>Out[54]: ('http',)
>In [55]: reCompiled = re.compile(reStr)
>In [56]: reCompiled.match(against).groups()
>Out[56]: ('http',)
>In [57]: reCompiled.match(against, re.I).groups()
>Out[57]: ('tp',)

LOL, and you'll be LOL too when you see the problem :-)

You can't give the re.I flag to reCompiled.match().  You have to give
it to re.compile().  The second argument to reCompiled.match() is the
position where to start searching.  I'm guessing re.I is defined as 2,
which explains the match you got.

This is actually one of those places where duck typing let us down.
If we had type bondage, re.I would be an instance of RegExFlags, and
reCompiled.match() would have thrown a TypeError when the second
argument wasn't an integer.  I'm not saying type bondage is inherently
better than duck typing, just that it has its benefits at times.



More information about the Python-list mailing list