[Python-Dev] File encodings
Gustavo Niemeyer
niemeyer at conectiva.com
Mon Nov 29 20:04:48 CET 2004
Greetings,
Today, while trying to internationalize a program I'm working on,
I found an interesting side-effect of how we're dealing with
encoding of unicode strings while being written to files.
Suppose the following example:
# -*- encoding: iso-8859-1 -*-
print u"á"
This will correctly print the string 'á', as expected. Now, what
surprises me, is that the following code won't work in an equivalent
way (unless using sys.setdefaultencoding()):
# -*- encoding: iso-8859-1 -*-
import sys
sys.stdout.write(u"á\n")
This will raise the following error:
Traceback (most recent call last):
File "asd.py", line 3, in ?
sys.stdout.write(u"á")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1'
in position 0:ordinal not in range(128)
This difference may become a really annoying problem when trying to
internationalize programs, since it's usual to see third-party code
dealing with sys.stdout, instead of using 'print'. The standard
optparse module, for instance, has a reference to sys.stdout which
is used in the default --help handling mechanism.
Given the fact that files have an 'encoding' parameter, and that
any unicode strings with characters not in the 0-127 range will
raise an exception if being written to files, isn't it reasonable
to respect the 'encoding' attribute whenever writing data to a
file?
The workaround for that problem is to either use the evil-considered
sys.setdefaultencoding(), or to wrap sys.stdout. IMO, both options
seem unreasonable for such a common idiom.
--
Gustavo Niemeyer
http://niemeyer.net
More information about the Python-Dev
mailing list