arian.kuschki at googlemail.com
Sat Oct 17 21:07:01 CEST 2009
Hm yes, that is true. In Firefox on the other hand, the response header is
"Content-Type text/xml; charset=UTF-8"
On Sat 17, 13:16 -0700, Mark Tolonen wrote:
> "Diez B. Roggisch" <deets at nospam.web.de> wrote in message
> news:7jub5rF37divlU4 at mid.uni-berlin.de...
> >This is wierd. I looked at the site in FireFox - and it was
> >displayed correctly, including umlauts. Bringing up the
> >info-dialog claims the page is UTF-8, the XML itself says so as
> >well (implicit, through the missing declaration of an encoding) -
> >but it clearly is *not* utf-8.
> >One would expect google to be better at this...
> According to the XML 1.0 specification:
> "Although an XML processor is required to read only entities in the
> UTF-8 and UTF-16 encodings, it is recognized that other encodings
> are used around the world, and it may be desired for XML processors
> to read entities that use them. In the absence of external character
> encoding information (such as MIME headers), parsed entities which
> are stored in an encoding other than UTF-8 or UTF-16 must begin with
> a text declaration..."
> So UTF-8 and UTF-16 are the defaults supported without an xml
> declaration in the absence of external encoding information. But we
> have external character encoding information:
> >>>f = urllib.urlopen("http://www.google.de/ig/api?weather=Muenchen")
> 'text/xml; charset=ISO-8859-1'
> So the page seems correct.
More information about the Python-list