"\b" behaviour at end of string, was RE: Simple (?) Regular Expression Question
Peter Otten
__peter__ at web.de
Mon Jan 19 04:25:18 EST 2004
Tim Peters wrote:
> [Steve Zatz]
>> Is '@' a special character in regular expressions?
>
> Nope.
>
>> I am asking because I don't understand the following:
>>
>> >>> import re
>> >>> s = ' @'
>> >>> re.sub(r'\b@','*',s)
>> ' @'
>> >>> s = ' a'
>> >>> re.sub(r'\ba','*',s)
>> ' *'
>
> \b matches a "word boundary", meaning it has to have a word character on
> one side (something that matches \w), and a non-word character on the
> other
> (something that matches \W), regardless of order. ' @' contains two
> non-word characters (' ' and '@'), so \b doesn't match anything in it. '
> a' contains a non-word character (' ') followed by a word character ('a'),
> so \b matches (an empty string) betwen those two characters.
Playing around with it a bit, I noticed that finditer() runs forever for the
"\b" regular expression:
Python 2.3.3 (#1, Jan 3 2004, 13:57:08)
[GCC 3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> r = re.compile(r"\b")
>>> [m.start() for (i, m) in zip(range(10), r.finditer("alpha"))]
[0, 5, 5, 5, 5, 5, 5, 5, 5, 5]
>>>
Bug or feature?
Peter
More information about the Python-list
mailing list