the stupid encoding problem to stdout

Benjamin Kaplan benjamin.kaplan at case.edu
Wed Jun 8 23:00:38 EDT 2011


2011/6/8 Sérgio Monteiro Basto <sergiomb at sapo.pt>:
> hi,
> cat test.py
> #!/usr/bin/env python
> #-*- coding: utf-8 -*-
> u = u'moçambique'
> print u.encode("utf-8")
> print u
>
> chmod +x test.py
> ./test.py
> moçambique
> moçambique
>
> ./test.py > output.txt
> Traceback (most recent call last):
>  File "./test.py", line 5, in <module>
>    print u
> UnicodeEncodeError: 'ascii' codec can't encode character
> u'\xe7' in position 2: ordinal not in range(128)
>
> in python 2.7
> how I explain to python to send the same thing to stdout and
> the file output.txt ?
>
> Don't seems logic, when send things to a file the beaviour
> change.
>
> Thanks,
> Sérgio M. B.

That's not a terminal vs file thing. It's a "file that declares it's
encoding" vs a "file that doesn't declare it's encoding" thing. Your
terminal declares that it is UTF-8. So when you print a Unicode string
to your terminal, Python knows that it's supposed to turn it into
UTF-8. When you pipe the output to a file, that file doesn't declare
an encoding. So rather than guess which encoding you want, Python
defaults to the lowest common denominator: ASCII. If you want
something to be a particular encoding, you have to encode it yourself.

You have a couple of choices on how to make it work:
1) Play dumb and always encode as UTF-8. This would look really weird
if someone tried running your program in a terminal with a CP-847
encoding (like cmd.exe on at least the US version of Windows), but it
would never crash.
2) Check sys.stdout.encoding. If it's ascii, then encode your unicode
string in the string-escape encoding, which substitutes the escape
sequence in for all non-ASCII characters.
3) Check to see if sys.stdout.isatty() and have different behavior for
terminals vs files. If you're on a terminal that doesn't declare its
encoding, encoding it as UTF-8 probably won't help. If you're writing
to a file, that might be what you want to do.



More information about the Python-list mailing list