How to ask sax for the file encoding
Diez B. Roggisch
deets at nospam.web.de
Wed Oct 4 10:57:39 EDT 2006
Edward K. Ream wrote:
>>> Can anyone tell me how the content handler can determine the encoding of
>>> the file? Can sax provide this info?
>
>> there is no encoding on the "inside" of an XML document; it's all
>> Unicode.
>
> True, but sax is reading the file, so sax is producing the unicode, so it
> should (must) be able to determine the encoding.
It is, by reading the xml header.
> Furthermore, xml files
> start with lines like:
>
> <?xml version="1.0" encoding="utf-8"?>
>
> so it would seem reasonable for sax to be able to return 'utf-8' somehow.
> Am I missing something?
That sax outputs unicode, which has no encoding associated anymore. And thus
it is a pretty much irrelevant information. It _could_ be retained, but for
what purpose?
Diez
More information about the Python-list
mailing list