
On Nov 9, 2007 3:59 PM, M.-A. Lemburg <mal@egenix.com> wrote:
Martin v. Löwis wrote:
It makes working with XML data a lot easier: you simply don't have to bother with the encoding of the XML data anymore and can just let the codec figure out the details. The XML parser can then work directly on the Unicode data.
Having the functionality indeed makes things easier. However, I don't find
s.decode(xml.detect_encoding(s))
particularly more difficult than
s.decode("xml-auto-detection")
Not really, but the codec has more control over what happens to the stream, ie. it's easier to implement look-ahead in the codec than to do the detection and then try to push the bytes back onto the stream (which may or may not be possible depending on the nature of the stream).
io.BufferedReader() standardizes a .peek() API, making it trivial. I don't see why we couldn't require it. (As an aside, .peek() will fail to do what detect_encodings() needs if BufferedReader's buffer size is too small. I do wonder if that limitation is appropriate.) -- Adam Olsen, aka Rhamphoryncus