What is file.encoding convention?
vinay_sajip at yahoo.co.uk
Thu Jul 23 21:42:29 CEST 2009
On Jul 23, 4:06 am, Naoki INADA <songofaca... at gmail.com> wrote:
> In document <http://docs.python.org/library/
> >> The encoding that this file uses. When Unicode strings are written to a file,
> >> they will be converted to byte strings using this encoding. In addition,
> >> when the file is connected to a terminal, the attribute gives the encoding
> >> that the terminal is likely to use
> But inlogging.StreamHandler.emit() ::
> if (isinstance(msg, unicode) and
> getattr(stream, 'encoding', None)):
> #fs = fs.decode(stream.encoding)
> stream.write(fs % msg)
> except UnicodeEncodeError:
> #Printing to terminals sometimes fails.
> For example,
> #with an encoding of 'cp1251', the above
> write will
> #work if written to a stream opened or
> wrapped by
> #the codecs module, but fail when writing
> to a
> #terminal even when the codepage is set to
> #An extra encoding step seems to be
> stream.write((fs % msg).encode
> stream.write(fs % msg)
> except UnicodeError:
> stream.write(fs % msg.encode("UTF-8"))
> And behavior of sys.stdout in Windows::>>> import sys
> >>> sys.stdout.encoding
> >>> u = u"あいう"
> >>> u
> u'\u3042\u3044\u3046'>>> print >>sys.stdout, u
> >>> sys.stderr.write(u)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode characters in position
> 0-2: ordinal not in range(128)
> What is file.encoding convention?
> If I want to write a unicode string to a file(-like) that have
> encoding attribute, I should do
> (1) try: file.write(unicode_str),
> (2) except UnicodeEncodeError: file.write(unicode_str.encode
> It seems agly.
Further to my earlier mail, please have a look at the following
As you can see, the codepage is set to 1251 (Cyrillic) at the
beginning. A Unicode string is initialised with Cyrillic code points.
Then sys.stdout.encoding shows 'cp1251', but writing the string to it
gives a UnicodeEncodeError. Explicitly encoding the string and writing
it works. Next, we get a wrapper from the codecs module for the same
encoding and use it to wrap sys.stdout. Writing the Unicode string to
the wrapped string works, too.
So the problem is essentially this: if a stream has an encoding
attribute, sometimes it is a wrapped stream which encodes Unicode
(e.g. a stream obtained via the codecs module) and sometimes it is not
(e.g. sys.stdout, sys.stderr).
More information about the Python-list