[Python-Dev] getting the UCS-2 representation of a unicode object
M.-A. Lemburg
mal@lemburg.com
Mon, 20 May 2002 23:22:45 +0200
Martin v. Loewis wrote:
> "M.-A. Lemburg" <mal@lemburg.com> writes:
>
>
>>>I sincerely hope that you are mistaken in your belief that
>>
>> > UCS-2 is so used.
>>
>>I should know...
>
>
> But, as John explains - you don't, right? Python uses UTF-16
> internally, see PEP 261.
This is really only academic fuzz: Python uses two bytes to store
Unicode code points -- it doesn't pay special attention to UTF-16
things like surrogates in the internals; only a few codecs do which
provide interfaces to the outside world.
BTW, PEP 261 uses the same terminology (UCS-2 instead of UTF-16)
and for the same reason:
--enable-unicode=ucs2 configures a narrow Py_UNICODE, and uses
wchar_t if it fits
--enable-unicode=ucs4 configures a wide Py_UNICODE, and uses
wchar_t if it fits
--enable-unicode same as "=ucs2"
If you're interested in more details, you should come to the
Unicode talk I'll give at EuroPython.
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Software: http://www.egenix.com/files/python/