[Python-Dev] Feed style codec API

Walter Dörwald walter at livinglogic.de
Wed Jan 12 19:18:04 CET 2005


Now that Python 2.4 is out the door (and the problems with
StreamReader.readline() are hopefully fixed), I'd like bring
up the topic of a feed style codec API again. A feed style API
would make it possible to use stateful encoding/decoding where
the data is not available as a stream.

Two examples:

- xml.sax.xmlreader.IncrementalParser: Here the client passes raw
   XML data to the parser in multiple calls to the feed() method.
   If the parser wants to use Python codecs machinery, it has to
   wrap a stream interface around the data passed to the feed()
   method.
- WSGI (PEP 333) specifies that the web application returns the
   fragments of the resulting webpage as an iterator. If this result
   is encoded unicode we have the same problem: This must be wrapped
   in a stream interface.

The simplest solution is to add a feed() method both to StreamReader
and StreamWriter, that takes the state of the codec into account,
but doesn't use the stream. This can be done by simply moving a
few lines of code into separate methods. I've uploaded a patch to
Sourceforge: #1101097.

There are other open issues with the codec changes: unicode-escape,
UTF-7, the CJK codecs and probably a few others don't support
decoding imcomplete input yet (although AFAICR the functionality
is mostly there in the CJK codecs).

Bye,
    Walter Dörwald


More information about the Python-Dev mailing list