Unicode question
Gerhard Häring
gh at ghaering.de
Fri Jul 18 05:41:23 EDT 2003
Ricardo Bugalho wrote:
> On Fri, 18 Jul 2003 02:07:13 +0200, Gerhard Häring wrote:
>>>Gerhard Häring <gh at ghaering.de> writes:
>>>>>>>u"äöü"
>>>>u'\x84\x94\x81'
>>>> [this works, but IMO shouldn't]
>
> You can use string literals in any encoding like this:
> 'string in my favorite encoding'.decode('my favorite encoding').
> Note that the lack of the u prefix. Not very confortable though..
> u'string' ends up doing the same as 'string'.decode('latin1').
Yep. It's the latin1 default that I'm critizizing.
> It doesn't work for docstrings though..
>
> I'm not sure for what you mean about encoding cookie,
See PEP 263 @ http://www.python.org/peps/pep-0263.html
> but I like the idea
> of each source file having some element that defines the encoding used to
> process string literals.
Then you'll like that exactly this is implemented in Python 2.3:
#!/usr/bin/python
# -*- coding: latin1 -*-
...
> Either that or we define the Python code must be written in UTF-8.
You can do that in Python 2.3 as well. Just save your source file with a
UTF-8 BOM and you don't even have to explicitly define an encoding
using an encoding cookie.
> But that would break lots of code.. :D
You'll get warnings if you don't define an encoding (either encoding
cookie or BOM) and use 8-Bit characters in your source files. These
warnings will becomome errors in later Python versions.
It's all in the PEP :)
-- Gerhard
More information about the Python-list
mailing list