[XML-SIG] Re: Re: SAX encoding and special characters

Mike Brown mike at skew.org
Sun Apr 18 04:54:14 EDT 2004


Fredrik Lundh wrote:
> and yes, since you have to read the entire document into a string, you can
> extract the encoding from that string.  here's a fairly robust RE that
> should
> do the trick:
> 
>     m = re.match(r"<\?xml[^>]+encoding=['\"]([-\w]+)['\"]", data)
>     if m:
>         encoding = m.group(1)

That works as long as the string itself is Unicode or is encoded with a
superset of ASCII. It won't work on UTF-16 (w/BOM), UTF-16LE, or UTF-16BE
strings.

There's also this, for detecting the actual encoding, not necessarily what's
declared: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52257
...it's not perfect, though, as I noted in the comments.



More information about the XML-SIG mailing list