On Wed, 3 May 2000, Fredrik Lundh wrote:
Guido van Rossum email@example.com wrote:
But there must be a way to turn on Unicode-awareness on e.g. stdout and then printing a Unicode object should not use str() (as it currently does).
to throw some extra gasoline on this, how about allowing str() to return unicode strings?
You still need to *print* them somehow. One way or another, stdout is still just a stream with bytes on it, unless we augment file objects to understand encodings.
stdout sends bytes to something -- and that something will interpret the stream of bytes in some encoding (could be Latin-1, UTF-8, ISO-2022-JP, whatever). So either:
1. You explicitly downconvert to bytes, and specify the encoding each time you do. Then write the bytes to stdout (or your file object).
2. The file object is smart and can be told what encoding to use, and Unicode strings written to the file are automatically converted to bytes.
Another thread mentioned having separate read/write and binary_read/binary_write methods on files. I suggest doing it the other way, actually: since read/write operate on byte streams now, *they* are the binary operations; the new methods should be the ones that do the extra encoding/decoding work, and could be called uniread/uniwrite, uread/uwrite, textread/textwrite, etc.
(extra questions: how about renaming "unicode" to "string", and getting rid of "unichr"?)
Would you expect chr(x) to return an 8-bit string when x < 128, and a Unicode string when x >= 128?