M.-A. Lemburg wrote:
On 2007-11-09 14:10, Walter Dörwald wrote:
Martin v. Löwis wrote:
Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc codecs to do the encoding. There's no need to create a magical mystery codec to pick out which though. So the code is good, if it is inside an XML parser, and it's bad if it is inside a codec? Exactly so. This functionality just *isn't* a codec - there is no encoding. Instead, it is an algorithm for *detecting* an encoding. And what do you do once you've detected the encoding? You decode the input, so why not combine both into an XML decoder?
FWIW: I'm +1 on adding such a codec.
It makes working with XML data a lot easier: you simply don't have to bother with the encoding of the XML data anymore and can just let the codec figure out the details. The XML parser can then work directly on the Unicode data.
Exactly. I have a version of sgmlop lying around that does that.
Whether it needs to be in C or not is another question (I would have done this in Python since performance is not really an issue), but since the code is already written, why not use it ?
Servus, Walter