[Python-Dev] Decoding incomplete unicode
Hye-Shik Chang
hyeshik at gmail.com
Thu Aug 19 14:21:50 CEST 2004
On Thu, 19 Aug 2004 12:29:12 +0200, M.-A. Lemburg <mal at egenix.com> wrote:
> Walter Dörwald wrote:
> > Without the feed method(), we need the following:
> >
> > 1) A StreamQueue class that
> > a) supports writing at one end and reading at the other end
> > b) has a method for pushing back unused bytes to be returned
> > in the next call to read()
>
> Right.
>
> It also needs a method giving the number of pending bytes in
> the queue or just an API .has_pending_data() that returns
> True/False.
>
+1 for adding .has_pending_data() stuff. But it'll need a way to
flush pending data out for encodings where incomplete sequence not
always invalid. <wink> This is true for JIS X 0213 encodings.
>>> u'\u00e6'.encode('euc-jisx0213')
'\xa9\xdc'
>>> u'\u3000'.encode('euc-jisx0213')
'\xa1\xa1'
>>> u'\u00e6\u0300'.encode('euc-jisx0213')
'\xab\xc4'
Hye-Shik
More information about the Python-Dev
mailing list