Re: [Python-Dev] Decoding incomplete unicode

19 Aug 2004


      Martin v. Löwis wrote:
...
Walter Dörwald wrote:
...
They will not, because StreamReader.decode() already is a feed
style API (but with state amnesia).
Any stream decoder that I can think of can be (and most are)
implemented by overwriting decode().
I consider that an unfortunate implementation artefact. You
either use the stateless encode/decode that you get from
codecs.get(encoder/decoder) or you use the file API on
the streams. You never ever use encode/decode on streams.
That is exactly the problem with the current API.
StreamReader mixes two concepts:

1) The stateful API, which allows decoding a byte input
    in chunk and the state of the decoder is kept between
    calls.
2) A file API where the chunks to be decoded are read
    from a byte stream.
...
I would have preferred if the default .write implementation
would have called self._internal_encode, and the Writer
would *contain* a Codec, rather than inheriting from Codec.
This would separate the two concepts from above.
...
Alas, for (I guess) simplicity, a more direct (and more
confusing) approach was taken.
...
1) Having feed() as part of the StreamReader API:
---
s = u"???".encode("utf-8")
r = codecs.getreader("utf-8")()
for c in s:
   print r.feed(c)
Isn't that a totally unrelated issue? Aren't we talking about
short reads on sockets etc?
We're talking about two problems:

1) The current implementation does not really support the
    stateful API, because trailing incomplete byte sequences
    lead to errors.
2) The current file API is not really convenient for decoding
    when the input is not read for a stream.
...
I would very much prefer to solve one problem at a time.
Bye,
    Walter Dörwald