print() and unicode strings (python 3.1)

Nobody nobody at nowhere.com
Tue Aug 25 08:34:33 EDT 2009


On Tue, 25 Aug 2009 03:41:54 -0700, 7stud wrote:

> Why does echoing $LC_ALL or $LC_CTYPE just give me a blank string?

Because the variables aren't set.

The default locale for a particular category (e.g. LC_CTYPE) is taken from
$LC_ALL if that is set, otherwise $LC_CTYPE, otherwise $LANG, otherwise
"C" is used.

Normally, you would either set LANG (and possibly some individual LC_*
variables), or LC_ALL. There's no point in setting all of them.

> In conclusion, as far as I can tell, if your python 3.1 program tries
> to output a unicode string, and the unicode string cannot be encoded
> by the codec specified in the user's LANG environment variable**, then
> the user will get an encode error. Just because the programmer's
> system can handle the output doesn't mean that another user's system
> can.  I guess that's the way it goes: if a user's environment is
> telling all programs that it only wants ascii output to go to the
> screen(sys.stdout), you can't(or shouldn't) do anything about it.
> 
> **Or if the LANG environment variable is not present, then the codec
> corresponding to the locale settings(C' corresponds to ascii).

The underlying OS primitive can only handle bytes. If you read or write a
(unicode) string, Python needs to know which encoding is used. For Python
file objects created by the user (via open() etc), you can specify the
encoding; for those created by the runtime (e.g. sys.stdin), Python uses
the locale's LC_CTYPE category to select an encoding.

Data written to or read from text streams is encoded or decoded using the
stream's encoding. Filenames are encoded and decoded using the
filesystem encoding (sys.getfilesystemencoding()). Anything else uses the
default encoding (sys.getdefaultencoding()).

In Python 3, text streams are handled using io.TextIOWrapper:

	http://docs.python.org/3.1/library/io.html#text-i-o

This implements a stream which can read and/or write text data on top of
one which can read and/or write binary data. The sys.std{in,out,err}
streams are instances of TextIOWrapper. You can get the underlying
binary stream from the "buffer" attribute, e.g.:

	sys.stdout.buffer.write(b'hello world\n')

If you need to force a specific encoding (e.g. if the user has specified
an encoding via a command-line option), you can detach the existing
wrapper and create a new one, e.g.:

	sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding = new_encoding)




More information about the Python-list mailing list