umlauts
Mark Tolonen
metolone+gmane at gmail.com
Sat Oct 17 16:16:45 EDT 2009
"Diez B. Roggisch" <deets at nospam.web.de> wrote in message
news:7jub5rF37divlU4 at mid.uni-berlin.de...
[snip]
> This is wierd. I looked at the site in FireFox - and it was displayed
> correctly, including umlauts. Bringing up the info-dialog claims the page
> is UTF-8, the XML itself says so as well (implicit, through the missing
> declaration of an encoding) - but it clearly is *not* utf-8.
>
> One would expect google to be better at this...
>
> Diez
According to the XML 1.0 specification:
"Although an XML processor is required to read only entities in the UTF-8
and UTF-16 encodings, it is recognized that other encodings are used around
the world, and it may be desired for XML processors to read entities that
use them. In the absence of external character encoding information (such as
MIME headers), parsed entities which are stored in an encoding other than
UTF-8 or UTF-16 must begin with a text declaration..."
So UTF-8 and UTF-16 are the defaults supported without an xml declaration in
the absence of external encoding information. But we have external
character encoding information:
>>> f = urllib.urlopen("http://www.google.de/ig/api?weather=Muenchen")
>>> f.headers.dict['content-type']
'text/xml; charset=ISO-8859-1'
So the page seems correct.
-Mark
More information about the Python-list
mailing list