sys.stdout.write()'s bug or doc bug?
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Sun Dec 28 09:31:11 EST 2008
On Sun, 28 Dec 2008 02:37:55 -0800, Qiangning Hong wrote:
>> > So, my question is, as sys.stdout IS a file object, why it does not
>> > use its encoding attribute to convert the given unicode? An
>> > implementation bug? A documenation bug?
>>
>> hmm I always thought "sys.stdout" is a "file-like object" not that it
>> IS a file.
>
> In my original post, I have figured out that sys.stdout IS a file, by
> using type() function. And isinstance() function tells the same:
>
> Python 2.5.2 (r252:60911, Dec 18 2008, 12:39:19) [GCC 4.2.1 (Apple Inc.
> build 5564)] on darwin Type "help", "copyright", "credits" or "license"
> for more information.
>>>> import sys
>>>> type(sys.stdout) is file
> True
>>>> isinstance(sys.stdout, file)
> True
>
> So, sys.stdout SHOULD do what the doc says, otherwise there is a bug
> either in implementation of sys.stdout, or in the documentation of file.
The documentation says:
file.encoding
The encoding that this file uses. When Unicode strings are written to a
file, they will be converted to byte strings using this encoding. In
addition, when the file is connected to a terminal, the attribute gives
the encoding that the terminal is likely to use (that information might
be incorrect if the user has misconfigured the terminal). The attribute
is read-only and may not be present on all file-like objects. It may also
be None, in which case the file uses the system default encoding for
converting Unicode strings.
New in version 2.3.
http://docs.python.org/library/stdtypes.html#file.encoding
And I agree that sys.stdout is a file. Using Python 2.6:
>>> type(sys.stdout)
<type 'file'>
I can confirm the behaviour you report:
>>> sys.stdout.encoding
'UTF-8'
>>> u = u"\u554a"
>>> print u
啊
>>> sys.stdout.write(u)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u554a' in
position 0: ordinal not in range(128)
But if you explicitly convert the string, it works:
>>> sys.stdout.write(u.encode('utf-8'))
啊
I agree that this appears to be a bug, either of the write() method or
the documentation.
--
Steven
More information about the Python-list
mailing list