Degree symbol (UTF-8 > ASCII)

Erik Max Francis max at alcyone.com
Wed Apr 16 21:56:34 EDT 2003


Peter Clark wrote:

>     Since the output is meant to be read to be displayed by a font
> which is in essentially latin-1 encoding, I need to restrict the
> manner in which the degree symbol is displayed to one byte. Yet I
> cannot get it to behave, even though 'print chr(176) works perfectly
> fine at the prompt. My suspicion is that the default encoding of the
> system is messing python up somewhere along the way--is there any way
> to tell it to just print the stupid character and not be concerned
> with the output?

I've come into this conversation late, but could it be that what's
confusing you is that UTF-8 and Latin-1 are not the same thing?  It
sounds like you want Latin-1 but are asking for UTF-8.  UTF-8 is an
octet representation of Unicode which uses escape sequences and the like
to represent eight-bit information; Latin-1 is an eight-bit encoding. 
Both have the property that pure-ASCII data will be represented without
modification, but they aren't the same beast.  If you're converting to
UTF-8 and are puzzled why 8-bit data is expanding to multiple
characters, then chances are UTF-8 isn't what you wanted.

[where u is a Unicode string representing the degree symbol]
>>> u.encode('latin-1')
'\xb0'
>>> u.encode('utf-8')
'\xc2\xb0'
>>> print u.encode('latin-1')
[the degree symbol]

-- 
 Erik Max Francis / max at alcyone.com / http://www.alcyone.com/max/
 __ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE
/  \ It was involuntary.  They sank my boat.
\__/ John F. Kennedy (on how he became a war hero)
    Bosskey.net: Return to Wolfenstein / http://www.bosskey.net/rtcw/
 A personal guide to Return to Castle Wolfenstein.




More information about the Python-list mailing list