[Python-Dev] Unicode regexp problem

Florent Guillaume fg@nuxeo.com
17 Sep 2002 03:00:09 +0200


I've got the following problem, in python 2.1, 2.2 and 2.3a0 (Debian):

>>> import re
>>> re.compile(r'\w+',   re.U).sub('X', u'hello caf\xe9')
u'X X'
>>> re.compile(r'\w{1}', re.U).sub('X', u'hello caf\xe9')
u'XXXXX XXXX'
>>> re.compile(r'\w',    re.U).sub('X', u'hello caf\xe9')
u'XXXXX XXX\xe9'

The first two results are ok, but the third is not.

Thanks,

Florent


PS: I'd appreciate a Cc on answers.

-- 
Florent Guillaume, Nuxeo (Paris, France)
+33 1 40 33 79 87  http://nuxeo.com  mailto:fg@nuxeo.com