re and locale/unicode
jerry.fleming at saybot.com
Tue Sep 21 05:21:55 CEST 2010
Having the following python code:
re.findall('(?uL)\s+', u'\u2001\u3000\x20', re.U|re.L)
re.findall('\s+', u'\u2001\u3000\x20', re.U|re.L)
I was wondering why doesn't it find the unicode space chars \u2001 and
\u3000? The python docs for re module says:
When the LOCALE and UNICODE flags are not specified, matches any
whitespace character; this is equivalent to the set [ \t\n\r\f\v]. With
LOCALE, it will match this set plus whatever characters are defined as
space for the current locale. If UNICODE is set, this will match the
characters [ \t\n\r\f\v] plus whatever is classified as space in the
Unicode character properties database.
which doesn't seem to work. Any ideas?
More information about the Python-list