I like Unicode more than I used to...

Thu Feb 20 23:14:05 EST 2003

On Thursday, February 20, 2003, at 04:06  PM, Terry Hancock wrote:

> Hmm. Why do I need to do that? Is there no way to figure out how to 
> print a
> unicode string when I'm running in a unicode capable terminal?  Also, 
> is

'Unicode capable terminal' actually means 'Terminal that understands 
utf-8
encoded sequences of bytes'. A 'Unicode String' is an abstract concept 
that
can have a concrete implementation as one of a number of possible 
encodings.
The only one that makes sense in a lot of cases (when ASCII 
compatibility is
required) is UTF-8, so people have tended to use them interchangeably 
and
perpetuate confusion. I think this may have been caused by Java people,
but I'm not really sure.

Its just like your old terminal  would only correctly display
sequences-of-bytes that were encoded in Latin-1 and might display odd
results if you send it a sequence-of-bytes encoded in windows-1250

> there a list somewhere of what the "".encode() method understands? I 
> was
> unable to find one.  I just guessed that "utf-8" would work from the 
> above
> example.  Is that extendable in Python, or is it compiled-in?

http://www.python.org/dev/doc/devel/lib/node125.html

> The last one does have a way to register new codecs in the codecs 
> module --
> can the string method use any codec defined there?  If so, how do you 
> use
> it?

Like any other codec:

u'Foo'.encode('rot13') # Unicode string to 8-bit encoding
'Foo'.decode('rot13')  # 8-bit encoding to Unicode string

-- 
Stuart Bishop <zen at shangri-la.dropbear.id.au>
http://shangri-la.dropbear.id.au/