[issue8796] Deprecate codecs.open()

Marc-Andre Lemburg report at bugs.python.org
Wed May 18 10:25:55 CEST 2011


Marc-Andre Lemburg <mal at egenix.com> added the comment:

STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner at haypocalc.com> added the comment:
> 
> Python 3.2 has been published. Can we start deprecating StreamWriter and StreamReader in Python 3.3 (to remove them from Python 3.4)? The doc should explain how to convert code using codecs into code using the io module (it should be simple), and using a StreamReader/StreamWriter should emit a warning.

This ticket is about deprecating codecs.open(), not about
StreamWriter and StreamReader.

The arguments mentioned here against doing that anytime soon
still stand.

I'm -1 on deprecating StreamWriter and StreamReader as they provide
different mechanisms than the io layer which has a specific focus
on files and buffers.

> --
> 
> codecs.StreamWriter writes twice the BOM of UTF-8-SIG, UTF-16, UTF-32 encodings if the file is opened in append mode or after a seek(0). Bug fixed in io.TextIOWrapper (issue #5006). io.TextIOWrapper calls also encoder.setstate(0) on a seek different than seek(0), whereas codecs.StreamWriter doesn't (it is not an incremental encoder, it doesn't have the setstate method).
> 
> codecs.StreamReader doesn't ignore the BOM of UTF-8-SIG, UTF-16 or UTF-32 encodings after seek(0). Bug fixed in io.TextIOWrapper (issue #4862).
> 
> These bugs should maybe be mentioned in the codecs doc, with a pointer to the io module saying that the io module handles these encodings correctly.

Those are not bugs of the generic codecs.StreamWriter/StreamReader
implementations or their concept. They are bugs in those specific
codecs.

The codecs StreamWriter and StreamReader concept was explicitly
designed to be able to have state. However, the generic implementation
does not make use of such state for the purpose of writing special
beginning-of-file markers - that's just way to specific for general
purpose implementations. They do use state to implement buffered
reads.

It would certainly be possible to make the implementations of
the codecs you mentioned smarter to handle writing BOMs correctly,
e.g. by making use of the incremental encoder/decoders, if there's
interest.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8796>
_______________________________________


More information about the Python-bugs-list mailing list