unicode by default

harrismh777 harrismh777 at charter.net
Thu May 12 02:31:16 EDT 2011


Ben Finney wrote:
> I'd phrase that as:

> * Text is a sequence of characters. Most inputs to the program,
>    including files, sockets, etc., contain a sequence of bytes.

> * Always know whether you're dealing with text or with bytes. No object
>    can be both.

> * In Python 2, ‘str’ is the type for a sequence of bytes. ‘unicode’ is
>    the type for text.

> * In Python 3, ‘str’ is the type for text. ‘bytes’ is the type for a
>    sequence of bytes.


That is very helpful...   thanks


MRAB, Steve, John, Terry, Ben F, Ben K, Ian...
    ...thank you guys so much, I think I've got a better picture now of 
what is going on... this is also one place where I don't think the books 
are as clear as they need to be at least for me...(Lutz, Summerfield).

So, the UTF-16 UTF-32 is INTERNAL only, for Python... and text in/out is 
based on locale... in my case UTF-8  ...that is enormously helpful for 
me... understanding locale on this system is as mystifying as unicode is 
in the first place.
Well, after reading about unicode tonight (about four hours) I realize 
that its not really that hard... there's just a lot of details that have 
to come together. Straightening out that whole tower-of-babel thing is 
sure a pain in the butt.
I also was not aware that UTF-8 chars could be up to six(6) byes long 
from left to right.  I see now that the little-endianness I was 
ascribing to python is just a function of hexdump... and I was a little 
disappointed to find that hexdump does not support UTF-8, just ascii...doh.
Anyway, thanks again... I've got enough now to play around a bit...

PS thanks Steve for that link, informative and entertaining too... Joe 
says, "If you are a programmer . . . and you don't know the basics of 
characters, character sets, encodings, and Unicode, and I catch you, I'm 
going to punish you by making you peel onions for 6 months in a 
submarine. I swear I will".     :)








kind regards,
m harris








More information about the Python-list mailing list