print() and unicode strings (python 3.1)
bbxx789_05ss at yahoo.com
Tue Aug 25 12:41:54 CEST 2009
On Aug 24, 10:09 pm, Ned Deily <n... at acm.org> wrote:
> In article
> <e5e2ec2e-2b4a-4ca8-8c0f-109e5f4eb... at v23g2000pro.googlegroups.com>,
> 7stud <bbxx789_0... at yahoo.com> wrote:
> > On Aug 24, 2:41 pm, "Martin v. Löwis" <mar... at v.loewis.de> wrote:
> > > > I can't figure out a way to programatically set the encoding for
> > > > sys.stdout. So where does that leave me?
> > > You should be setting the terminal encoding administratively, not
> > > programmatically.
> > The terminal encoding has always been utf-8. It was not set
> > programmatically.
> > It seems to me that python 3.1's string handling is broken.
> > Apparently, in python 3.1 I am unable to explicitly set the encoding
> > of a string and print() it out with the result being human readable
> > text. On the other hand, if I let python do the encoding implicitly,
> > python uses a codec I don't want it to.
> If you are running on a Unix-y system, check your locale settings (LANG,
> LC.*, et al). I think you'll likely find that your locale is really not
> UTF-8. The following was on Python 3.1 on OS X 10.5, similar results
> on Debian Linux:
> $ cat t3.py
> import sys
> s = "¤"
> $ export LANG=en_US.UTF-8
> $ python3.1 t3.py
> $ export LANG=C
> $ python3.1 t3.py
> Traceback (most recent call last):
> File "t3.py", line 7, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in
> position 0: ordinal not in range(128)
> Ned Deily,
> n... at acm.org
Thanks for the response. My OS is mac osx 10.4.11. I'm not really
sure how to check my locale settings. Here is some stuff I tried:
$ echo $LANG
$ echo $LC_ALL
$ echo $LC_CTYPE
Used as a substitute for any unset LC_* variable. If LANG is unset it
will act as if set to "C". If any of LANG or LC_* are set to invalide
values locale acts as if they are all unset.
As in your last example, my 'C' settings mean that an ascii codec is
used somewhere to encode() the unicode string.
The locale C or POSIX is a portable locale; its LC_CTYPE part
corresponds to the 7-bit ASCII character set.
Is this the way it works:
1) python sets the codec for sys.stdout to the LANG environment
2) It doesn't matter that my terminal's encoding is set to utf-8
because output has to pass through sys.stdout first.
a) My terminal's environment is telling python(and all other programs
running in the terminal) that output sent to sys.stdout must be
encoded in ascii.
b) The solution is to set a LANG environment variable.
Why does echoing $LC_ALL or $LC_CTYPE just give me a blank string?
Previously, I've set environment variables that I want to be
permanent, e.g PATH, in ~/.bash_profile, so I did this:
and now python 3.1 acts like I expect it to:
s = "€"
In conclusion, as far as I can tell, if your python 3.1 program tries
to output a unicode string, and the unicode string cannot be encoded
by the codec specified in the user's LANG environment variable**, then
the user will get an encode error. Just because the programmer's
system can handle the output doesn't mean that another user's system
can. I guess that's the way it goes: if a user's environment is
telling all programs that it only wants ascii output to go to the
screen(sys.stdout), you can't(or shouldn't) do anything about it.
**Or if the LANG environment variable is not present, then the codec
corresponding to the locale settings(C' corresponds to ascii).
some good locale info:
More information about the Python-list