[issue20132] Many incremental codecs don’t handle fragmented data

Walter Dörwald report at bugs.python.org
Fri Jan 10 12:26:50 CET 2014


Walter Dörwald added the comment:

The best solution IMHO would be to implement real incremental codecs for all of those.

Maybe iterencode() with an empty iterator should never call encode()? (But IMHO it would be better to document that iterencode()/iterdecode() should only be used with "real" codecs.)

Note that the comment before PyUnicode_DecodeUTF7Stateful() in unicodeobject.c reads:

/* The decoder.  The only state we preserve is our read position,
 * i.e. how many characters we have consumed.  So if we end in the
 * middle of a shift sequence we have to back off the read position
 * and the output to the beginning of the sequence, otherwise we lose
 * all the shift state (seen bits, number of bits seen, high
 * surrogate). */

Changing that would have to introduce a state object that the codec updates and from which it can be restarted.

Also the encoder does not buffer anything. To implement the suggested behaviour, the encoder might have to buffer unlimited data.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue20132>
_______________________________________


More information about the Python-bugs-list mailing list