[ python-Bugs-1611131 ] \b in unicode regex gives strange results

Thu Dec 7 22:44:28 CET 2006

Bugs item #1611131, was opened at 2006-12-07 23:44
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1611131&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Regular Expressions
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: akaihola (akaihola)
Assigned to: Gustavo Niemeyer (niemeyer)
Summary: \b in unicode regex gives strange results

Initial Comment:
The problem: This doesn't give a match:
>>> re.match(r'ä\b', 'ä ', re.UNICODE)

This works ok and gives a match:
>>> re.match(r'.\b', 'ä ', re.UNICODE)

Both of these work as well:
>>> re.match(r'a\b', 'a ', re.UNICODE)
>>> re.match(r'.\b', 'a ', re.UNICODE)

Docs say \b is defined as an empty string between \w and \W. These do match accordingly:
>>> re.match(r'\w', 'ä', re.UNICODE)
>>> re.match(r'\w', 'a', re.UNICODE)
>>> re.match(r'\W', ' ', re.UNICODE)

So something strange happens in my first example, and I can't help but assume it's a bug.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1611131&group_id=5470