unicode by default
harrismh777
harrismh777 at charter.net
Thu May 12 02:31:16 EDT 2011
Ben Finney wrote:
> I'd phrase that as:
> * Text is a sequence of characters. Most inputs to the program,
> including files, sockets, etc., contain a sequence of bytes.
> * Always know whether you're dealing with text or with bytes. No object
> can be both.
> * In Python 2, ‘str’ is the type for a sequence of bytes. ‘unicode’ is
> the type for text.
> * In Python 3, ‘str’ is the type for text. ‘bytes’ is the type for a
> sequence of bytes.
That is very helpful... thanks
MRAB, Steve, John, Terry, Ben F, Ben K, Ian...
...thank you guys so much, I think I've got a better picture now of
what is going on... this is also one place where I don't think the books
are as clear as they need to be at least for me...(Lutz, Summerfield).
So, the UTF-16 UTF-32 is INTERNAL only, for Python... and text in/out is
based on locale... in my case UTF-8 ...that is enormously helpful for
me... understanding locale on this system is as mystifying as unicode is
in the first place.
Well, after reading about unicode tonight (about four hours) I realize
that its not really that hard... there's just a lot of details that have
to come together. Straightening out that whole tower-of-babel thing is
sure a pain in the butt.
I also was not aware that UTF-8 chars could be up to six(6) byes long
from left to right. I see now that the little-endianness I was
ascribing to python is just a function of hexdump... and I was a little
disappointed to find that hexdump does not support UTF-8, just ascii...doh.
Anyway, thanks again... I've got enough now to play around a bit...
PS thanks Steve for that link, informative and entertaining too... Joe
says, "If you are a programmer . . . and you don't know the basics of
characters, character sets, encodings, and Unicode, and I catch you, I'm
going to punish you by making you peel onions for 6 months in a
submarine. I swear I will". :)
kind regards,
m harris
More information about the Python-list
mailing list