[Python-Dev] Python in Unicode context

"Martin v. Löwis" martin at v.loewis.de
Tue Aug 3 19:24:11 CEST 2004


François Pinard wrote:
> One thing is that a Python module should have some way to know the
> encoding used in its source file, maybe some kind of `module.__coding__'
> next to `module.__file__', saving the coding effectively used while
> compilation was going on. 

That would be possible to implement. Feel free to create a patch.

> I wonder if some other cookie, next to the `coding:'
> cookie, could not be used to declare that all strings _in this module
> only_ should be interpreted as Unicode by default, but without the need
> of resorting to `u' prefix all over.

This could be a starting point of another syntax debate. For example,

from __future__ import string_literals_are_unicode

would be possible to implement. If PEP 244 would have been adapted, I
would have proposed

directive unicode_strings

Other syntax forms would also be possible. Again, if you know a syntax
which you like, propose a patch. Be prepared to also write a PEP 
defending that syntax.

> P.S. - Should I say and confess, one thing I do not like much about
> Unicode is how proponents often perceive it, like a religion, and all
> the fanatism going with it.  Unicode should be seen and implemented as
> a choice, more than a life commitment :-). Right now, my feeling is that
> Python asks a bit too much of a programmer, in terms of commitment, if
> we only consider the editing work required on sources to use it, or not.

Not sure what you are referring here to. You do have the choice of
source encodings, and, in fact, "Unicode" is not a valid source
encoding. "UTF-8" is, and from a Python point of view, there is
absolutely no difference between that and, say, "ISO-8859-15" -
you can choose whatever source encoding you like, and Python does
not favour any of them (strictly speaking, it favour ASCII, then
ISO-8859-1, then the rest).

Choice of source encoding is different from the choice of string
literals. You can use Unicode strings, or byte strings, or mix them.
It really is your choice.

Regards,
Martin


More information about the Python-Dev mailing list