Yet another unicode WTF

Gabriel Genellina gagsl-py2 at
Thu Jun 4 22:09:47 EDT 2009

En Thu, 04 Jun 2009 22:18:24 -0300, Ron Garret <rNOSPAMon at>  

> Python 2.6.2 on OS X 10.5.7:
> [ron at mickey:~]$ echo $LANG
> en_US.UTF-8
> [ron at mickey:~]$ cat
> #!/usr/bin/env python
> print u'\u03BB'
> [ron at mickey:~]$ ./
> ª
> [ron at mickey:~]$ ./ > foo
> Traceback (most recent call last):
>   File "./", line 2, in <module>
>     print u'\u03BB'
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u03bb' in
> position 0: ordinal not in range(128)
> (That's supposed to be a small greek lambda, but I'm using a
> brain-damaged news reader that won't let me set the character encoding.
> It shows up correctly in my terminal.)
> According to what I thought I knew about unix (and I had fancied myself
> a bit of an expert until just now) this is impossible.  Python is
> obviously picking up a different default encoding when its output is
> being piped to a file, but I always thought one of the fundamental
> invariants of unix processes was that there's no way for a process to
> know what's on the other end of its stdout.

It may be hard to know *who* is at the other end of the pipe, but it's  
easy to know *what* kind of file it is.
Lots of programs detect whether stdout is a tty or not (using isatty(3))  
and adapt their output accordingly; ls is one example.

Python knows the terminal encoding (or at least can make a good guess),  
but a file may use *any* encoding you want, completely unrelated to your  
terminal settings. So when stdout is redirected, Python refuses to guess  
its encoding; see the PYTHONIOENCODING environment variable.

Gabriel Genellina

More information about the Python-list mailing list