Today, while trying to internationalize a program I'm working on, I found an interesting side-effect of how we're dealing with encoding of unicode strings while being written to files.
Suppose the following example:
# -*- encoding: iso-8859-1 -*- print u"á"
This will correctly print the string 'á', as expected. Now, what surprises me, is that the following code won't work in an equivalent way (unless using sys.setdefaultencoding()):
# -*- encoding: iso-8859-1 -*- import sys sys.stdout.write(u"á\n")
This will raise the following error:
Traceback (most recent call last): File "asd.py", line 3, in ? sys.stdout.write(u"á") UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 0:ordinal not in range(128)
This difference may become a really annoying problem when trying to internationalize programs, since it's usual to see third-party code dealing with sys.stdout, instead of using 'print'. The standard optparse module, for instance, has a reference to sys.stdout which is used in the default --help handling mechanism.
Given the fact that files have an 'encoding' parameter, and that any unicode strings with characters not in the 0-127 range will raise an exception if being written to files, isn't it reasonable to respect the 'encoding' attribute whenever writing data to a file?
The workaround for that problem is to either use the evil-considered sys.setdefaultencoding(), or to wrap sys.stdout. IMO, both options seem unreasonable for such a common idiom.