Degree symbol (UTF-8 > ASCII)

Ben Hutchings do-not-spam-ben.hutchings at businesswebsoftware.com
Thu Apr 24 12:16:01 EDT 2003


In article <f7199550.0304160930.6cfeca65 at posting.google.com>, Peter Clark wrote:
> I'm working with a xml document which doesn't include an encoding, so
> it defaults to UTF-8. Of course, all of the text is ASCII, and likely
> to remain so. I would like to insert the degree symbol (chr(176)), but
> because this is outside the bounds (chr(128) is the limit), Python
> raises an XML error. What's the simplest way of getting including an
> unadorned degree symbol?

Since you mentioned an XML error I'm assuming you're using some XML API 
to read and write the file, so you want a Unicode string and the answer
is:
    unichr(176)

chr(176) returns an ordinary string which should be interpreted in the
local encoding, which is ASCII by default; however, ASCII doesn't define
a character 176.  Python doesn't interpret strings as anything other
than arbitrary bytes until you try to print or convert them, so that's
why the error only shows up later.

> Again, it's not necessary to preserve the UTF-8 encoding, but I'm not
> quite certain as to how to tell Python that the XML document is plain
> ASCII (or I guess in the case of the degree symbol, latin-1).

An XML document in some 8-bit or multibyte encoding other than UTF-8
*must* have the encoding specified in its XML declaration.  You should
not try to override this.




More information about the Python-list mailing list