[Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
walter at livinglogic.de
Tue May 24 12:16:49 CEST 2011
On 24.05.11 02:08, Victor Stinner wrote:
> codecs.open() and StreamReader, StreamWriter and StreamReaderWriter
> classes of the codecs module don't support universal newlines, still
> have some issues with stateful codecs (like UTF-16/32 BOMs), and each
> codec has to implement a StreamReader and a StreamWriter class.
> StreamReader and StreamWriter are stateless codecs (no reset() or
> setstate() method),
They *are* stateful, they just don't expose their state to the public.
> and so it's not possible to write a generic fix for
> all child classes in the codecs module. Each stateful codec has to
> handle special cases like seek() problems.
Yes, which in theory makes it possible to implement shortcuts for
certain codecs (e.g. the UTF-32-BE/LE codecs could simply multiply the
character position by 4 to get the byte position). However AFAICR none
of the readers/writers does that.
> For example, UTF-16 codec
> duplicates some IncrementalEncoder/IncrementalDecoder code into its
> StreamWriter/StreamReader class.
Actually it's the other way round: When I implemented the incremental
codecs, I copied code from the StreamReader/StreamWriter classes.
> The io module is well tested, supports non-seekable streams, handles
> correctly corner-cases (like UTF-16/32 BOMs) and supports any kind of
> newlines including an "universal newline" mode. TextIOWrapper reuses
> incremental encoders and decoders, so BOM issues were fixed only once,
> in TextIOWrapper.
> It's trivial to replace a call to codecs.open() by a call to open(),
> because the two API are very close. The main different is that
> codecs.open() doesn't support universal newline, so you have to use
> open(..., newline='') to keep the same behaviour (keep newlines
> unchanged). This task can be done by 2to3. But I suppose that most
> people will be happy with the universal newline mode.
> I don't see which usecase is not covered by TextIOWrapper. But I know
> some cases which are not supported by StreamReader/StreamWriter.
This could be be partially fixed by implementing generic
StreamReader/StreamWriter classes that reuse the incremental codecs, but
I don't think thats worth it.
More information about the Python-Dev