[Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

Tue May 24 10:03:22 CEST 2011

Victor Stinner wrote:
> Hi,
> 
> In Python 2, codecs.open() is the best way to read and/or write files
> using Unicode. But in Python 3, open() is preferred with its fast io
> module. I would like to deprecate codecs.open() because it can be
> replaced by open() and io.TextIOWrapper. I would like your opinion and
> that's why I'm writing this email.

I think you should have moved this part of your email
further up, since it explains the reason why this idea was
rejected for now:

> I opened an issue for this idea. Brett and Marc-Andree Lemburg don't
> want to deprecate codecs.open() & friends because they want to be able
> to write code working on Python 2 and on Python 3 without any change. I
> don't think it's realistic: nontrivial programs require at least the six
> module, and most likely the 2to3 program. The six module can have its
> "codecs.open" function if codecs.open is removed from Python 3.4.

And now for something completely different:

> codecs.open() and StreamReader, StreamWriter and StreamReaderWriter
> classes of the codecs module don't support universal newlines, still
> have some issues with stateful codecs (like UTF-16/32 BOMs), and each
> codec has to implement a StreamReader and a StreamWriter class.
> 
> StreamReader and StreamWriter are stateless codecs (no reset() or
> setstate() method), and so it's not possible to write a generic fix for
> all child classes in the codecs module. Each stateful codec has to
> handle special cases like seek() problems. For example, UTF-16 codec
> duplicates some IncrementalEncoder/IncrementalDecoder code into its
> StreamWriter/StreamReader class.

Please read PEP 100 regarding StreamReader and StreamWriter.
Those codecs parts were explicitly designed to be stateful,
unlike the stateless encoder/decoder methods.

Please read my reply on the ticket:

"""
StreamReader and StreamWriter classes provide the base codec
implementations for stateful interaction with streams. They
define the interface and provide a working implementation for
those codecs that choose not to implement their own variants.

Each codec can, however, implement variants which are optimized
for the specific encoding or intercept certain stream methods
to add functionality or improve the encoding/decoding
performance.

Both are essential parts of the codec interface.

TextIOWrapper and StreamReaderWriter are merely wrappers
around streams that make use of the codecs. They don't
provide any codec logic themselves. That's the conceptual
difference.
"""

> The io module is well tested, supports non-seekable streams, handles
> correctly corner-cases (like UTF-16/32 BOMs) and supports any kind of
> newlines including an "universal newline" mode. TextIOWrapper reuses
> incremental encoders and decoders, so BOM issues were fixed only once,
> in TextIOWrapper.
> 
> It's trivial to replace a call to codecs.open() by a call to open(),
> because the two API are very close. The main different is that
> codecs.open() doesn't support universal newline, so you have to use
> open(..., newline='') to keep the same behaviour (keep newlines
> unchanged). This task can be done by 2to3. But I suppose that most
> people will be happy with the universal newline mode.
> 
> I don't see which usecase is not covered by TextIOWrapper. But I know
> some cases which are not supported by StreamReader/StreamWriter.

This is a misunderstanding of the concepts behind the two.

StreamReader and StreamWriters are implemented by the codecs,
they are part of the API that each codec has to provide in order
to register in the Python codecs system. Their purpose is
to provide a stateful interface and work efficiently and
directly on streams rather than buffers.

Here's my reply from the ticket regarding using incremental
encoders/decoders for the StreamReader/Writer parts of the
codec set of APIs:

"""
The point about having them use incremental codecs for encoding and decoding is a good one and would
need to be investigated. If possible, we could use incremental encoders/decoders for the standard
StreamReader/Writer base classes or add new IncrementalStreamReader/Writer classes which then use
the IncrementalEncode/Decoder per default.

Please open a new ticket for this.
"""

> StreamReader, StreamWriter, StreamReaderEncoder and EncodedFile are not
> used in the Python 3 standard library. I tried removed them: except
> tests of test_codecs which test them directly, the full test suite pass.
>
> Read the issue for more information: http://bugs.python.org/issue8796

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 24 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2011-06-20: EuroPython 2011, Florence, Italy               27 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/