[Python-Dev] Decoding incomplete unicode
"Martin v. Löwis"
martin at v.loewis.de
Wed Aug 18 23:57:22 CEST 2004
Walter Dörwald wrote:
> But then a file that contains the two bytes 0x61, 0xc3
> will never generate an error when read via an UTF-8 reader.
> The trailing 0xc3 will just be ignored.
>
> Another option we have would be to add a final() method
> to the StreamReader, that checks if all bytes have been
> consumed.
Alternatively, we could add a .buffer() method that returns
any data that are still pending (either a Unicode string or
a byte string).
> Maybe this should be done by StreamReader.close()?
No. There is nothing wrong with only reading a part of a file.
> Now
> inShift counts the number of characters (and the shortcut
> for a "+-" sequence appearing together has been removed.
Ok. I didn't actually check the correctness of the individual
methods.
OTOH, I think time spent on UTF-7 is wasted, anyway.
> Would a version of the patch without a final argument but
> with a feed() method be accepted?
I don't see the need for a feed method. .read() should just
block until data are available, and that's it.
> I'm imagining implementing an XML parser that uses Python's
> unicode machinery and supports the
> xml.sax.xmlreader.IncrementalParser interface.
I think this is out of scope of this patch. The incremental
parser could implement a regular .read on a StringIO file
that also supports .feed.
> Without the feed method(), we need the following:
>
> 1) A StreamQueue class that
Why is that? I thought we are talking about "Decoding
incomplete unicode"?
Regards,
Martin
More information about the Python-Dev
mailing list