Flexible string representation, unicode, typography, ...
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Fri Aug 31 08:32:25 EDT 2012
On Thu, 30 Aug 2012 16:44:32 -0400, Terry Reedy wrote:
> On 8/30/2012 12:00 PM, Steven D'Aprano wrote:
>> On Thu, 30 Aug 2012 07:02:24 -0400, Roy Smith wrote:
[...]
>>> Is the implementation smart enough to know that x == y is always False
>>> if x and y are using different internal representations?
>
> Yes, after checking lengths, and in same circumstances, x != y is True.
[snip C code]
Thanks Terry for looking that up.
> 'a in s' is also False if a chars are wider than s chars.
Now that's a nice optimization!
[...]
>> But x and y are not necessarily always False just because they have
>> different representations. There may be circumstances where two strings
>> have different internal representations even though their content is
>> the same, so it's an unsafe optimization to automatically treat them as
>> unequal.
>
> I am sure that str objects are always in canonical form once visible to
> Python code. Note that unready (non-canonical) objects are rejected by
> the rich comparison function.
That's one thing that I'm unclear about -- under what circumstances will
a string be in compact versus non-compact form? Reading between the
lines, I guess that a lot of the complexity of the implementation only
occurs while a string is being built. E.g. if you have Python code like
this:
''.join(str(x) for x in something) # a generator expression
Python can't tell how much space to allocate for the string -- it doesn't
know either the overall length of the string or the width of the
characters. So I presume that there is string builder code for dealing
with that, and that it involves resizing blocks of memory.
But if you do this:
''.join([str(x) for x in something]) # a list comprehension
Python could scan the list first, find out the widest char, and allocate
exactly the amount of space needed for the string. Even in Python 2,
joining a list comp is much faster than joining a gen expression.
--
Steven
More information about the Python-list
mailing list