RE + UTF-8
Michael Ströder
michael at stroeder.com
Sun Sep 25 01:44:08 EDT 2005
cepl at surfbest.net wrote:
>
> I have tried to test RE and UTF-8 in Python generally and the results
> are even more confusing (done with locale cs_CZ.UTF-8 in konsole):
>
>>>locale.getpreferredencoding()
>
> 'UTF-8'
>
>>>>print re.sub("(\w*)","X","[Chelcický]",re.L)
You first have to turn the raw strings into Unicode strings. It seems on
your console it should be:
unicode('[Chelcický]','utf-8')
Note that you have to set HTTP headers and <form accept-charset=...> in
web applications.
Ciao, Michael.
More information about the Python-list
mailing list