[Python-Dev] Decoding incomplete unicode

M.-A. Lemburg mal at egenix.com
Tue Jul 27 22:59:59 CEST 2004

Walter Dörwald wrote:
> Pythons unicode machinery currently has problems when decoding
> incomplete input.
> When codecs.StreamReader.read() encounters a decoding error it
> reads more bytes from the input stream and retries decoding.
> This is broken for two reasons:
> 1) The error might be due to a malformed byte sequence in the input,
>    a problem that can't be fixed by reading more bytes.
> 2) There may be no more bytes available at this time. Once more
>    data is available decoding can't continue because bytes from
>    the input stream have already been read and thrown away.
> (sio.DecodingInputFilter has the same problems)
> I've uploaded a patch that fixes these problems to SF:
> http://www.python.org/sf/998993
> The patch implements a few additional features:
> - read() has an additional argument chars that can be used to
>   specify the number of characters that should be returned.
> - readline() is supported on all readers derived from
>   codecs.StreamReader().
> - readline() and readlines() have an additional option
>   for dropping the u"\n".
> The patch is still missing changes to the escape codecs
> ("unicode_escape" and "raw_unicode_escape") and I haven't
> touched the CJK codecs, but it has test cases that check
> the new functionality for all affected codecs
> (UTF-7, UTF-8, UTF-16, UTF-16-LE, UTF-16-BE).
> Could someone take a look at the patch?

Just did... please see the comments in the SF tracker.

I like the idea, but don't think the implementation is
the right way to do it. Instead, I'd suggest using a new
error handling strategy "break" ( = break processing as
soon as errors are found).

The advantage of this approach is twofold:

* no new APIs or API changes are required

* other codecs (including third-party ones) can easily
   implement the same strategy

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jul 27 2004)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

More information about the Python-Dev mailing list