[Python-Dev] Unicode regexp problem
Florent Guillaume
fg@nuxeo.com
17 Sep 2002 03:00:09 +0200
I've got the following problem, in python 2.1, 2.2 and 2.3a0 (Debian):
>>> import re
>>> re.compile(r'\w+', re.U).sub('X', u'hello caf\xe9')
u'X X'
>>> re.compile(r'\w{1}', re.U).sub('X', u'hello caf\xe9')
u'XXXXX XXXX'
>>> re.compile(r'\w', re.U).sub('X', u'hello caf\xe9')
u'XXXXX XXX\xe9'
The first two results are ok, but the third is not.
Thanks,
Florent
PS: I'd appreciate a Cc on answers.
--
Florent Guillaume, Nuxeo (Paris, France)
+33 1 40 33 79 87 http://nuxeo.com mailto:fg@nuxeo.com