umlauts

Arian Kuschki arian.kuschki at googlemail.com
Sat Oct 17 15:07:01 EDT 2009


Hm yes, that is true. In Firefox on the other hand, the response header is
"Content-Type text/xml; charset=UTF-8"

On Sat 17, 13:16 -0700, Mark Tolonen wrote:

> 
> "Diez B. Roggisch" <deets at nospam.web.de> wrote in message
> news:7jub5rF37divlU4 at mid.uni-berlin.de...
> [snip]
> >This is wierd. I looked at the site in FireFox - and it was
> >displayed correctly, including umlauts. Bringing up the
> >info-dialog claims the page is UTF-8, the XML itself says so as
> >well (implicit, through the missing declaration of an encoding) -
> >but it clearly is *not* utf-8.
> >
> >One would expect google to be better at this...
> >
> >Diez
> 
> According to the XML 1.0 specification:
> 
> "Although an XML processor is required to read only entities in the
> UTF-8 and UTF-16 encodings, it is recognized that other encodings
> are used around the world, and it may be desired for XML processors
> to read entities that use them. In the absence of external character
> encoding information (such as MIME headers), parsed entities which
> are stored in an encoding other than UTF-8 or UTF-16 must begin with
> a text declaration..."
> 
> So UTF-8 and UTF-16 are the defaults supported without an xml
> declaration in the absence of external encoding information.  But we
> have external character encoding information:
> 
> >>>f = urllib.urlopen("http://www.google.de/ig/api?weather=Muenchen")
> >>>f.headers.dict['content-type']
> 'text/xml; charset=ISO-8859-1'
> 
> So the page seems correct.
> 
> -Mark
> 
> 
> -- 
> http://mail.python.org/mailman/listinfo/python-list

-- 



More information about the Python-list mailing list