[Baypiggies] Unicode woes using print and redirects...

Mitch Patenaude patenaude at gmail.com
Sat Aug 22 01:56:49 CEST 2009


I have a problem when outputting some strings using the print builtin.
 If there is a unicode specific string it does the right thing as long
as stdout hasn't been redirected.

I have tired to use the codec.EncodedFile to wrap sys.stdout and try
to get it to recode the output, or fool it into thinking that stdout
can handle utf8 in either case, but that only causes *both* cases to
fail, even though I pass in either errors='ignore' or
errors='replace'.  I'm stumped.  Can anyone enlighten me?

Details:
When I run it from the command line without redirection it works fine:
mitch at phobos:~/src/pylib/twarkov$ ./enc_test.py
isatty
Encoding: UTF-8
foo⑵

Yay!

but when I redirect the output at all, it fails:
mitch at phobos:~/src/pylib/twarkov$ ./enc_test.py | cat
notatty
Encoding: None
Damnit!  'ascii' codec can't encode character u'\u2475' in position 3:
ordinal not in range(128)

enc_test.py:
#!/usr/bin/python

import sys
import codecs

ttyout = sys.stderr

if sys.stdout.isatty():
  ttyout.write('isatty\n')
else:
  ttyout.write('notatty\n')

ttyout.write('Encoding: %s\n' % sys.stdout.encoding)

fooout = codecs.EncodedFile(sys.stdout, 'utf-8',
file_encoding='utf-8', errors='ignore')

trouble=u'foo\u2475\n'

try:
  # fooout.write(trouble)
  print trouble
  ttyout.write('Yay!\n')
except UnicodeEncodeError, e:
  ttyout.write('Damnit!  %s\n' % (e,))

mitch at phobos:~/src/pylib/twarkov$ uname -a
Linux phobos 2.6.24-24-generic #1 SMP Tue Aug 18 17:04:53 UTC 2009
i686 GNU/Linux
mitch at phobos:~/src/pylib/twarkov$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=8.04
DISTRIB_CODENAME=hardy
DISTRIB_DESCRIPTION="Ubuntu 8.04.3 LTS"
mitch at phobos:~/src/pylib/twarkov$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=


More information about the Baypiggies mailing list