[Python-Dev] Decoding incomplete unicode
M.-A. Lemburg
mal at egenix.com
Thu Aug 19 14:25:55 CEST 2004
Hye-Shik Chang wrote:
> On Thu, 19 Aug 2004 12:29:12 +0200, M.-A. Lemburg <mal at egenix.com> wrote:
>
>>Walter Dörwald wrote:
>>
>>>Without the feed method(), we need the following:
>>>
>>>1) A StreamQueue class that
>>> a) supports writing at one end and reading at the other end
>>> b) has a method for pushing back unused bytes to be returned
>>> in the next call to read()
>>
>>Right.
>>
>>It also needs a method giving the number of pending bytes in
>>the queue or just an API .has_pending_data() that returns
>>True/False.
>>
>
>
> +1 for adding .has_pending_data() stuff. But it'll need a way to
> flush pending data out for encodings where incomplete sequence not
> always invalid. <wink> This is true for JIS X 0213 encodings.
>
>
>>>>u'\u00e6'.encode('euc-jisx0213')
>
> '\xa9\xdc'
>
>>>>u'\u3000'.encode('euc-jisx0213')
>
> '\xa1\xa1'
>
>>>>u'\u00e6\u0300'.encode('euc-jisx0213')
>
> '\xab\xc4'
I'm not sure I understand. The queue will also have an .unread()
method (or similiar) to write data back into the queue at the
reading head position. Are you suggesting that we add a .truncate()
method to truncate the read buffer at the current position ?
Since the queue will be in memory, we can also add .writeseek()
and .readseek() if that helps.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Aug 19 2004)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
More information about the Python-Dev
mailing list