[Python-Dev] getting the UCS-2 representation of a unicode object

M.-A. Lemburg mal@lemburg.com
Mon, 20 May 2002 23:22:45 +0200


Martin v. Loewis wrote:
> "M.-A. Lemburg" <mal@lemburg.com> writes:
> 
> 
>>>I sincerely hope that you are mistaken in your belief that
>>
>> > UCS-2 is so used.
>>
>>I should know...
> 
> 
> But, as John explains - you don't, right? Python uses UTF-16
> internally, see PEP 261.

This is really only academic fuzz: Python uses two bytes to store
Unicode code points -- it doesn't pay special attention to UTF-16
things like surrogates in the internals; only a few codecs do which
provide interfaces to the outside world.

BTW, PEP 261 uses the same terminology (UCS-2 instead of UTF-16)
and for the same reason:

         --enable-unicode=ucs2 configures a narrow Py_UNICODE, and uses
                               wchar_t if it fits
         --enable-unicode=ucs4 configures a wide Py_UNICODE, and uses
                               wchar_t if it fits
         --enable-unicode      same as "=ucs2"

If you're interested in more details, you should come to the
Unicode talk I'll give at EuroPython.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                   http://www.egenix.com/files/python/