[Python-Dev] Decoding incomplete unicode

Hye-Shik Chang hyeshik at gmail.com
Thu Aug 19 14:21:50 CEST 2004


On Thu, 19 Aug 2004 12:29:12 +0200, M.-A. Lemburg <mal at egenix.com> wrote:
> Walter Dörwald wrote:
> > Without the feed method(), we need the following:
> >
> > 1) A StreamQueue class that
> >    a) supports writing at one end and reading at the other end
> >    b) has a method for pushing back unused bytes to be returned
> >       in the next call to read()
> 
> Right.
> 
> It also needs a method giving the number of pending bytes in
> the queue or just an API .has_pending_data() that returns
> True/False.
> 

+1 for adding .has_pending_data() stuff.  But it'll need a way to
flush pending data out for encodings where incomplete sequence not
always invalid. <wink> This is true for JIS X 0213 encodings.

>>> u'\u00e6'.encode('euc-jisx0213')
'\xa9\xdc'
>>> u'\u3000'.encode('euc-jisx0213')
'\xa1\xa1'
>>> u'\u00e6\u0300'.encode('euc-jisx0213')
'\xab\xc4'


Hye-Shik


More information about the Python-Dev mailing list