Flexible string representation, unicode, typography, ...
steve+comp.lang.python at pearwood.info
Thu Aug 30 18:00:52 CEST 2012
On Thu, 30 Aug 2012 07:02:24 -0400, Roy Smith wrote:
> In article <503f0e45$0$9416$c3e8da3$76491128 at news.astraweb.com>,
> Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:
>> The only thing which is innovative here is that instead of the Python
>> compiler declaring that "all strings will be stored in UCS-2", the
>> compiler chooses an implementation for each string as needed. So some
>> strings will be stored internally as UCS-4, some as UCS-2, and some as
>> ASCII (which is a standard, but not the Unicode consortium's standard).
> Is the implementation smart enough to know that x == y is always False
> if x and y are using different internal representations?
But x and y are not necessarily always False just because they have
different representations. There may be circumstances where two strings
have different internal representations even though their content is the
same, so it's an unsafe optimization to automatically treat them as
The closest existing equivalent here is the relationship between ints and
longs in Python 2. 42 == 42L even though they have different internal
representations and take up a different amount of space.
My expectation is that the initial implementation of PEP 393 will be
relatively unoptimized, and over the next few releases it will get more
efficient. That's usually the way these things go.
More information about the Python-list