[Python-bugs-list] [ python-Bugs-610299 ] unicode alphanumeric regexp bug

SourceForge.net noreply@sourceforge.net
Sun, 23 Feb 2003 17:29:20 -0800


Bugs item #610299, was opened at 2002-09-16 21:18
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=610299&group_id=5470

Category: Regular Expressions
Group: Python 2.3
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Florent Guillaume (efge)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: unicode alphanumeric regexp bug

Initial Comment:
I've got the following problem, in python 2.1, 2.2 and
2.3a0 (Debian):

>>> import re
>>> re.compile(r'\w+', re.U).sub('X', u'hello caf\xe9')
u'X X'
>>> re.compile(r'\w{1}', re.U).sub('X', u'hello caf\xe9')
u'XXXXX XXXX'
>>> re.compile(r'\w', re.U).sub('X', u'hello caf\xe9')
u'XXXXX XXX\xe9'

The first two results are ok, but the third is not.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2003-02-23 20:29

Message:
Logged In: YES 
user_id=6380

Fixed in 2.3 CVS using Greg's patch. Will backport to 2.2 as
well.

----------------------------------------------------------------------

Comment By: Greg Chapman (glchapman)
Date: 2002-11-04 11:51

Message:
Logged In: YES 
user_id=86307

I just posted a small patch to sre_compile.py which should fix this:

http://sourceforge.net/tracker/?
func=detail&aid=633359&group_id=5470&atid=305470

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=610299&group_id=5470