Degree symbol (UTF-8 > ASCII)

Steven Taschuk staschuk at telusplanet.net
Wed Apr 16 14:50:30 EDT 2003


Quoth Peter Clark:
> I'm working with a xml document which doesn't include an encoding, so
> it defaults to UTF-8. Of course, all of the text is ASCII, and likely
> to remain so. I would like to insert the degree symbol (chr(176)), but
> because this is outside the bounds (chr(128) is the limit), Python
> raises an XML error. What's the simplest way of getting including an
> unadorned degree symbol? Again, it's not necessary to preserve the
> UTF-8 encoding, but I'm not quite certain as to how to tell Python
> that the XML document is plain ASCII (or I guess in the case of the
> degree symbol, latin-1).

I'm not sure what you're after.

On the one hand, you say the text is all ASCII "and likely to
remain so", but then immediately declare that you want to insert a
non-ASCII character, namely the degree symbol.  If the XML
document is to be in ASCII, it may not include chr(0xB0), since
ASCII has no such character.  End of story.

If you want this character, either your document must be in a
character encoding which includes it (such as UTF-8 or
ISO-8859-1), or it must use XML character entity markup "°".

If you use ISO-8859-1 (aka Latin-1), you'll have to indicate so in
the XML declaration; if I were you I'd leave the document in
UTF-8.  (Among other things, XML processors are not required to
understand ISO-8859-1, but they are required to understand UTF-8.)

-- 
Steven Taschuk                          staschuk at telusplanet.net
"Its force is immeasurable.  Even Computer cannot determine it."
                           -- _Space: 1999_ episode "Black Sun"





More information about the Python-list mailing list