How to ask sax for the file encoding

Diez B. Roggisch deets at nospam.web.de
Wed Oct 4 10:57:39 EDT 2006


Edward K. Ream wrote:

>>> Can anyone tell me how the content handler can determine the encoding of
>>> the file?  Can sax provide this info?
> 
>> there is no encoding on the "inside" of an XML document; it's all
>> Unicode.
> 
> True, but sax is reading the file, so sax is producing the unicode, so it
> should (must) be able to determine the encoding. 

It is, by reading the xml header.

> Furthermore, xml files 
> start with lines like:
> 
> <?xml version="1.0" encoding="utf-8"?>
> 
> so it would seem reasonable for sax to be able to return 'utf-8' somehow.
> Am I missing something?

That sax outputs unicode, which has no encoding associated anymore. And thus
it is a pretty much irrelevant information. It _could_ be retained, but for
what purpose?

Diez



More information about the Python-list mailing list