[Python-Dev] Decoding incomplete unicode
mal at egenix.com
Tue Jul 27 22:59:59 CEST 2004
Walter Dörwald wrote:
> Pythons unicode machinery currently has problems when decoding
> incomplete input.
> When codecs.StreamReader.read() encounters a decoding error it
> reads more bytes from the input stream and retries decoding.
> This is broken for two reasons:
> 1) The error might be due to a malformed byte sequence in the input,
> a problem that can't be fixed by reading more bytes.
> 2) There may be no more bytes available at this time. Once more
> data is available decoding can't continue because bytes from
> the input stream have already been read and thrown away.
> (sio.DecodingInputFilter has the same problems)
> I've uploaded a patch that fixes these problems to SF:
> The patch implements a few additional features:
> - read() has an additional argument chars that can be used to
> specify the number of characters that should be returned.
> - readline() is supported on all readers derived from
> - readline() and readlines() have an additional option
> for dropping the u"\n".
> The patch is still missing changes to the escape codecs
> ("unicode_escape" and "raw_unicode_escape") and I haven't
> touched the CJK codecs, but it has test cases that check
> the new functionality for all affected codecs
> (UTF-7, UTF-8, UTF-16, UTF-16-LE, UTF-16-BE).
> Could someone take a look at the patch?
Just did... please see the comments in the SF tracker.
I like the idea, but don't think the implementation is
the right way to do it. Instead, I'd suggest using a new
error handling strategy "break" ( = break processing as
soon as errors are found).
The advantage of this approach is twofold:
* no new APIs or API changes are required
* other codecs (including third-party ones) can easily
implement the same strategy
Professional Python Services directly from the Source (#1, Jul 27 2004)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
More information about the Python-Dev