What encoding does u'...' syntax use?

Denis Kasak denis.kasak at gmail.com
Sat Feb 21 15:48:05 EST 2009


On Sat, Feb 21, 2009 at 9:45 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>>> Indeed. As Python *can* encode all characters even in 2-byte mode
>>> (since PEP 261), it seems clear that Python's Unicode representation
>>> is *not* strictly UCS-2 anymore.
>>
>> Since we're already discussing this, I'm curious - why was UCS-2
>> chosen over plain UTF-16 or UTF-8 in the first place for Python's
>> internal storage?
>
> You mean, originally? Originally, the choice was only between UCS-2
> and UCS-4; choice was in favor of UCS-2 because of size concerns.
> UTF-8 was ruled out easily because it doesn't allow constant-size
> indexing; UTF-16 essentially for the same reason (plus there was
> no point to UTF-16, since there were no assigned characters outside
> the BMP).

Yes, I failed to realise how long ago the unicode data type was
implemented originally. :-)
Thanks for the explanation.

-- 
Denis Kasak



More information about the Python-list mailing list