unicode by default

harrismh777 harrismh777 at charter.net
Wed May 11 17:37:49 EDT 2011


hi folks,
    I am puzzled by unicode generally, and within the context of python 
specifically. For one thing, what do we mean that unicode is used in 
python 3.x by default. (I know what default means, I mean, what changed?)

    I think part of my problem is that I'm spoiled (American, ascii 
heritage) and have been either stuck in ascii knowingly, or UTF-8 
without knowing (just because the code points lined up). I am confused 
by the implications for using 3.x, because I am reading that there are 
significant things to be aware of... what?

    On my installation 2.6  sys.maxunicode comes up with 1114111, and my 
2.7 and 3.2 installs come up with 65535 each. So, I am assuming that 2.6 
was compiled with UCS-4 (UTF-32) option for 4 byte unicode(?) and that 
the default compile option for 2.7 & 3.2 (I didn't change anything) is 
set for UCS-2 (UTF-16) or 2 byte unicode(?).   Do I understand this much 
correctly?

    The books say that the .py sources are UTF-8 by default... and that 
3.x is either UCS-2 or UCS-4.  If I use the file handling capabilities 
of Python in 3.x (by default) what encoding will be used, and how will 
that affect the output?

    If I do not specify any code points above ascii 0xFF does any of 
this matter anyway?



Thanks.

kind regards,
m harris




More information about the Python-list mailing list