
Walter Dörwald wrote:
Pythons unicode machinery currently has problems when decoding incomplete input.
When codecs.StreamReader.read() encounters a decoding error it reads more bytes from the input stream and retries decoding. This is broken for two reasons: 1) The error might be due to a malformed byte sequence in the input, a problem that can't be fixed by reading more bytes. 2) There may be no more bytes available at this time. Once more data is available decoding can't continue because bytes from the input stream have already been read and thrown away. (sio.DecodingInputFilter has the same problems)
I've uploaded a patch that fixes these problems to SF: http://www.python.org/sf/998993
The patch implements a few additional features: - read() has an additional argument chars that can be used to specify the number of characters that should be returned. - readline() is supported on all readers derived from codecs.StreamReader(). - readline() and readlines() have an additional option for dropping the u"\n".
The patch is still missing changes to the escape codecs ("unicode_escape" and "raw_unicode_escape") and I haven't touched the CJK codecs, but it has test cases that check the new functionality for all affected codecs (UTF-7, UTF-8, UTF-16, UTF-16-LE, UTF-16-BE).
Could someone take a look at the patch?
Just did... please see the comments in the SF tracker. I like the idea, but don't think the implementation is the right way to do it. Instead, I'd suggest using a new error handling strategy "break" ( = break processing as soon as errors are found). The advantage of this approach is twofold: * no new APIs or API changes are required * other codecs (including third-party ones) can easily implement the same strategy -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 27 2004)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::