Walter Dörwald wrote:
But then a file that contains the two bytes 0x61, 0xc3 will never generate an error when read via an UTF-8 reader. The trailing 0xc3 will just be ignored.
Another option we have would be to add a final() method to the StreamReader, that checks if all bytes have been consumed.
Alternatively, we could add a .buffer() method that returns any data that are still pending (either a Unicode string or a byte string).
Maybe this should be done by StreamReader.close()?
No. There is nothing wrong with only reading a part of a file.
Now inShift counts the number of characters (and the shortcut for a "+-" sequence appearing together has been removed.
Ok. I didn't actually check the correctness of the individual methods.
OTOH, I think time spent on UTF-7 is wasted, anyway.
Would a version of the patch without a final argument but with a feed() method be accepted?
I don't see the need for a feed method. .read() should just block until data are available, and that's it.
I'm imagining implementing an XML parser that uses Python's unicode machinery and supports the xml.sax.xmlreader.IncrementalParser interface.
I think this is out of scope of this patch. The incremental parser could implement a regular .read on a StringIO file that also supports .feed.
Without the feed method(), we need the following:
- A StreamQueue class that
Why is that? I thought we are talking about "Decoding incomplete unicode"?