What encoding does u'...' syntax use?

Denis Kasak denis.kasak at gmail.com
Sat Feb 21 15:24:30 EST 2009


On Sat, Feb 21, 2009 at 9:10 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>>> I'm pretty much sure it is UCS-2 or UCS-4. (Yes, I know there is only a
>>> slight difference to UTF-16/UTF-32).
>>
>> I wouldn't call the difference that slight, especially between UTF-16
>> and UCS-2, since the former can encode all Unicode code points, while
>> the latter can only encode those in the BMP.
>
> Indeed. As Python *can* encode all characters even in 2-byte mode
> (since PEP 261), it seems clear that Python's Unicode representation
> is *not* strictly UCS-2 anymore.

Since we're already discussing this, I'm curious - why was UCS-2
chosen over plain UTF-16 or UTF-8 in the first place for Python's
internal storage?

-- 
Denis Kasak



More information about the Python-list mailing list