![](https://secure.gravatar.com/avatar/dbdddb64dc47a7853e836edfed6b1f3f.jpg?s=120&d=mm&r=g)
Hye-Shik Chang wrote:
On Tue, 27 Jul 2004 22:39:45 +0200, Walter Dörwald <walter@livinglogic.de> wrote:
Pythons unicode machinery currently has problems when decoding incomplete input.
When codecs.StreamReader.read() encounters a decoding error it reads more bytes from the input stream and retries decoding. This is broken for two reasons: 1) The error might be due to a malformed byte sequence in the input, a problem that can't be fixed by reading more bytes. 2) There may be no more bytes available at this time. Once more data is available decoding can't continue because bytes from the input stream have already been read and thrown away. (sio.DecodingInputFilter has the same problems)
StreamReaders and -Writers from CJK codecs are not suffering from this problems because they have internal buffer for keeping states and incomplete bytes of a sequence. In fact, CJK codecs has its own implementation for UTF-8 and UTF-16 on base of its multibytecodec system. It provides a "working" StreamReader/Writer already. :)
Seems you had the same problems with the builtin stream readers! ;) BTW, how do you solve the problem that incomplete byte sequences are retained in the middle of a stream, but should generate errors at the end?
I've uploaded a patch that fixes these problems to SF: http://www.python.org/sf/998993
The patch implements a few additional features: - read() has an additional argument chars that can be used to specify the number of characters that should be returned. - readline() is supported on all readers derived from codecs.StreamReader().
I have no comment for these, yet.
- readline() and readlines() have an additional option for dropping the u"\n".
+1
I wonder whether we need to add optional argument for writelines() to add newline characters for each lines, then.
This would probably be a nice convenient additional feature, but of course you could always pass a GE to writelines(): stream.writelines(line+u"\n" for line in lines) Bye, Walter Dörwald