umlauts
Arian Kuschki
arian.kuschki at googlemail.com
Sat Oct 17 15:07:01 EDT 2009
Hm yes, that is true. In Firefox on the other hand, the response header is
"Content-Type text/xml; charset=UTF-8"
On Sat 17, 13:16 -0700, Mark Tolonen wrote:
>
> "Diez B. Roggisch" <deets at nospam.web.de> wrote in message
> news:7jub5rF37divlU4 at mid.uni-berlin.de...
> [snip]
> >This is wierd. I looked at the site in FireFox - and it was
> >displayed correctly, including umlauts. Bringing up the
> >info-dialog claims the page is UTF-8, the XML itself says so as
> >well (implicit, through the missing declaration of an encoding) -
> >but it clearly is *not* utf-8.
> >
> >One would expect google to be better at this...
> >
> >Diez
>
> According to the XML 1.0 specification:
>
> "Although an XML processor is required to read only entities in the
> UTF-8 and UTF-16 encodings, it is recognized that other encodings
> are used around the world, and it may be desired for XML processors
> to read entities that use them. In the absence of external character
> encoding information (such as MIME headers), parsed entities which
> are stored in an encoding other than UTF-8 or UTF-16 must begin with
> a text declaration..."
>
> So UTF-8 and UTF-16 are the defaults supported without an xml
> declaration in the absence of external encoding information. But we
> have external character encoding information:
>
> >>>f = urllib.urlopen("http://www.google.de/ig/api?weather=Muenchen")
> >>>f.headers.dict['content-type']
> 'text/xml; charset=ISO-8859-1'
>
> So the page seems correct.
>
> -Mark
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
--
More information about the Python-list
mailing list