[Python-3000] Comment on iostack library
tomer filiba
tomerfiliba at gmail.com
Fri Sep 1 10:05:10 CEST 2006
very well, i'll use it. thanks.
On 9/1/06, Walter Dörwald <walter at livinglogic.de> wrote:
> tomer filiba wrote:
>
> > [...]
> > besides, encoding suffers from many issues. suppose you have a
> > damaged UTF8 file, which you read char-by-char. when we reach the
> > damaged part, you'll never be able to "skip" it, as we'll just keep
> > read()ing bytes, hoping to make a character out of it , until we
> > reach EOF, i.e.:
> >
> > def read_char(self):
> > buf = ""
> > while not self._stream.eof:
> > buf += self._stream.read(1)
> > try:
> > return buf.decode("utf8")
> > except ValueError:
> > pass
> >
> > which leads me to the following thought: maybe we should have
> > an "enhanced" encoding library for py3k, which would report
> > *incomplete* data differently from *invalid* data. today it's just a
> > ValueError: suppose decode() would raise IncompleteDataError
> > when the given data is not sufficient to be decoded successfully,
> > and ValueError when the data is just corrupted.
> >
> > that could aid iostack greatly.
>
> We *do* have that functionality in Python 2.5: incremental decoders can
> retain incomplete byte sequences on the call to the decode() method
> until the next call. Only when final=True is passed in the decode() call
> will it treat incomplete and invalid data in the same way: by raising an
> exception.
>
> Incomplete input:
> >>> import codecs
> >>> d = codecs.lookup("utf-8").incrementaldecoder()
> >>> d.decode("\xe1")
> u''
> >>> d.decode("\x88")
> u''
> >>> d.decode("\xb4")
> u'\u1234'
>
> Invalid input:
> >>> import codecs
> >>> d = codecs.lookup("utf-8").incrementaldecoder()
> >>> d.decode("\x80")
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/var/home/walter/checkouts/Python/test/Lib/codecs.py", line 256,
> in decode
> (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
> unexpected code byte
>
> Incomplete input with final=True:
> >>> import codecs
> >>> d = codecs.lookup("utf-8").incrementaldecoder()
> >>> d.decode("\xe1", final=True)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/var/home/walter/checkouts/Python/test/Lib/codecs.py", line 256,
> in decode
> (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 0:
> unexpected end of data
>
> Servus,
> Walter
>
>
More information about the Python-3000
mailing list