Arian Kuschki arian.kuschki at
Sat Oct 17 21:07:01 CEST 2009

Hm yes, that is true. In Firefox on the other hand, the response header is
"Content-Type text/xml; charset=UTF-8"

On Sat 17, 13:16 -0700, Mark Tolonen wrote:

> "Diez B. Roggisch" <deets at> wrote in message
> news:7jub5rF37divlU4 at
> [snip]
> >This is wierd. I looked at the site in FireFox - and it was
> >displayed correctly, including umlauts. Bringing up the
> >info-dialog claims the page is UTF-8, the XML itself says so as
> >well (implicit, through the missing declaration of an encoding) -
> >but it clearly is *not* utf-8.
> >
> >One would expect google to be better at this...
> >
> >Diez
> According to the XML 1.0 specification:
> "Although an XML processor is required to read only entities in the
> UTF-8 and UTF-16 encodings, it is recognized that other encodings
> are used around the world, and it may be desired for XML processors
> to read entities that use them. In the absence of external character
> encoding information (such as MIME headers), parsed entities which
> are stored in an encoding other than UTF-8 or UTF-16 must begin with
> a text declaration..."
> So UTF-8 and UTF-16 are the defaults supported without an xml
> declaration in the absence of external encoding information.  But we
> have external character encoding information:
> >>>f = urllib.urlopen("")
> >>>f.headers.dict['content-type']
> 'text/xml; charset=ISO-8859-1'
> So the page seems correct.
> -Mark
> -- 


More information about the Python-list mailing list