[Python-Dev] Removing the implicit str() call from printing API

Andy Robinson andy@reportlab.com
Sun, 11 Feb 2001 09:18:55 -0000


> Open questions:
>
>     - If an encoding is specified, should file.read() then
>       always return Unicode objects?
>
>     - If an encoding is specified, should file.write() only
>       accept Unicode objects and not bytestrings?
>
>     - Is the encoding attribute mutable?  (I would prefer not,
>       but then how to apply an encoding to sys.stdout?)

Right now, codecs.open returns an instance of
codecs.StreamReaderWriter, not a native file object.  It has methods
that look like the ones on a file, but they tpically accept or return
Unicode strings instead of binary ones.  This feels right to me
and is what Java does; if you want to switch encoding on sys.stdout,
you are not really doing anything to the file object, just switching
the wrapper you use.

There is much discussion on the i18n sig  about 'unifying' binary
and Unicode strings at the moment.

> Side question: i noticed that the Lib/encodings directory supports
> quite a few code pages, including Greek, Russian, but there are no
> ISO-2022 CJK or JIS codecs.  Is this just because no one felt like
> writing one, or is there a reason not to include one?  It seems to
> me it might be nice to include some codecs for the most common CJK
> encodings -- that recent note on the popularity of Python in Korea
> comes to mind.

There have been 3 contributions to Asian codecs on the i18n sig in the
last
six months (pythoncodecs.sourceforge.net) one C, two J and one K -
but some authors are uncomfortable with Python-style licenses.  They
need tying together into one integrated package with a test suite.

After a 5-month-long project which tied me up, I have finally started
ooking at this. The general feeling was that the Asian codecs package
should be an optional download, but if we can get them fully tested
and do some compression magic it would be nice to get them in the
box one day.

- Andy Robinson