I like Unicode more than I used to...
Stuart Bishop
zen at shangri-la.dropbear.id.au
Thu Feb 20 23:14:05 EST 2003
On Thursday, February 20, 2003, at 04:06 PM, Terry Hancock wrote:
> Hmm. Why do I need to do that? Is there no way to figure out how to
> print a
> unicode string when I'm running in a unicode capable terminal? Also,
> is
'Unicode capable terminal' actually means 'Terminal that understands
utf-8
encoded sequences of bytes'. A 'Unicode String' is an abstract concept
that
can have a concrete implementation as one of a number of possible
encodings.
The only one that makes sense in a lot of cases (when ASCII
compatibility is
required) is UTF-8, so people have tended to use them interchangeably
and
perpetuate confusion. I think this may have been caused by Java people,
but I'm not really sure.
Its just like your old terminal would only correctly display
sequences-of-bytes that were encoded in Latin-1 and might display odd
results if you send it a sequence-of-bytes encoded in windows-1250
> there a list somewhere of what the "".encode() method understands? I
> was
> unable to find one. I just guessed that "utf-8" would work from the
> above
> example. Is that extendable in Python, or is it compiled-in?
http://www.python.org/dev/doc/devel/lib/node125.html
> The last one does have a way to register new codecs in the codecs
> module --
> can the string method use any codec defined there? If so, how do you
> use
> it?
Like any other codec:
u'Foo'.encode('rot13') # Unicode string to 8-bit encoding
'Foo'.decode('rot13') # 8-bit encoding to Unicode string
--
Stuart Bishop <zen at shangri-la.dropbear.id.au>
http://shangri-la.dropbear.id.au/
More information about the Python-list
mailing list