[ python-Bugs-1611131 ] \b in unicode regex gives strange results
SourceForge.net
noreply at sourceforge.net
Thu Dec 7 22:44:28 CET 2006
Bugs item #1611131, was opened at 2006-12-07 23:44
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1611131&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Regular Expressions
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: akaihola (akaihola)
Assigned to: Gustavo Niemeyer (niemeyer)
Summary: \b in unicode regex gives strange results
Initial Comment:
The problem: This doesn't give a match:
>>> re.match(r'ä\b', 'ä ', re.UNICODE)
This works ok and gives a match:
>>> re.match(r'.\b', 'ä ', re.UNICODE)
Both of these work as well:
>>> re.match(r'a\b', 'a ', re.UNICODE)
>>> re.match(r'.\b', 'a ', re.UNICODE)
Docs say \b is defined as an empty string between \w and \W. These do match accordingly:
>>> re.match(r'\w', 'ä', re.UNICODE)
>>> re.match(r'\w', 'a', re.UNICODE)
>>> re.match(r'\W', ' ', re.UNICODE)
So something strange happens in my first example, and I can't help but assume it's a bug.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1611131&group_id=5470
More information about the Python-bugs-list
mailing list