What encoding does u'...' syntax use?
"Martin v. Löwis"
martin at v.loewis.de
Sat Feb 21 15:45:09 EST 2009
>> Indeed. As Python *can* encode all characters even in 2-byte mode
>> (since PEP 261), it seems clear that Python's Unicode representation
>> is *not* strictly UCS-2 anymore.
>
> Since we're already discussing this, I'm curious - why was UCS-2
> chosen over plain UTF-16 or UTF-8 in the first place for Python's
> internal storage?
You mean, originally? Originally, the choice was only between UCS-2
and UCS-4; choice was in favor of UCS-2 because of size concerns.
UTF-8 was ruled out easily because it doesn't allow constant-size
indexing; UTF-16 essentially for the same reason (plus there was
no point to UTF-16, since there were no assigned characters outside
the BMP).
Regards,
Martin
More information about the Python-list
mailing list