Degree symbol (UTF-8 > ASCII)

Martin v. Löwis martin at
Fri Apr 18 10:45:28 CEST 2003

pc451 at (Peter Clark) writes:

> > And scale is a Unicode string, right?
>     Only because the XML document has no specified encoding, so it
> defaults to UTF-8, yes. But all the text is straight ASCII, except of
> course for the inclusion of the degree symbol.

No. UTF-8 is *not* Unicode. In Python, there are two data types: <type
'string'>, and <type 'unicode'>. The type string represents bytes (8
bit per element), and the type unicode represents characters.

The string type can also be used to represent characters, but only if
you assume that you are using some encoding. UTF-8 is an encoding, and
so is Latin-1. A string encoded in UTF-8 is still a byte string, not a
character string. A Unicode object may contain characters that can be
encoded in ASCII, or it can contain characters that cannot be encoded

If this is not the mental model that you have, you will have a hard
time understanding all the phenomenons you observe, and I suggest


More information about the Python-list mailing list