9 Nov
2007
9 Nov
'07
7:41 a.m.
Walter Dörwald wrote:
Martin v. Löwis wrote:
Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc codecs to do the encoding. There's no need to create a magical mystery codec to pick out which though. So the code is good, if it is inside an XML parser, and it's bad if it is inside a codec? Exactly so. This functionality just *isn't* a codec - there is no encoding. Instead, it is an algorithm for *detecting* an encoding.
And what do you do once you've detected the encoding? You decode the input, so why not combine both into an XML decoder?
In fact, we already have such a codec. The utf-16 decoder looks at the first two bytes and then decides to forward the rest to either a utf-16-be or a utf-16-le decoder. Servus, Walter