howto combine regex special sequences?

Carey Evans careye at spamcop.net
Thu Oct 25 20:35:12 EDT 2001


Laura Creighton <lac at strakt.com> writes:

[...]

> I've been up for 36 hours, but the expression you are looking for is
>  r = re.compile('[^\w\s]')
> 
> But you don't want to do that.  Z is the last letter of the alphabet 
> in New Zealand, but Ö is the last letter here in Sweden.  Use the
> string methods instead.

Or use the appropriate flags with the regular expression.  The Unicode
character classes will always be consistent, but the locale's
alphanumeric characters will vary, from the default locale based on
US-ASCII:

>>> import locale, re
>>> locale.setlocale(locale.LC_ALL, 'C')
'C'
>>> re.match(r'[^\w\s]', 'ö') and 'matched'
'matched'
>>> re.match(r'[^\w\s]', 'ö', re.LOCALE) and 'matched'
'matched'
>>> re.match(r'[^\w\s]', 'ö', re.UNICODE) and 'matched'
>>> 

to other locales that use ISO-8859-1:

>>> import locale, re
>>> locale.setlocale(locale.LC_ALL, 'en_NZ')
'en_NZ'
>>> re.match(r'[^\w\s]', 'ö') and 'matched'
'matched'
>>> re.match(r'[^\w\s]', 'ö', re.LOCALE) and 'matched'
>>> re.match(r'[^\w\s]', 'ö', re.UNICODE) and 'matched'
>>> 

-- 
	 Carey Evans  http://home.clear.net.nz/pages/c.evans/

		      "Ha ha!  Puny receptacle!"



More information about the Python-list mailing list