Flexible string representation, unicode, typography, ...

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Aug 31 14:32:25 CEST 2012


On Thu, 30 Aug 2012 16:44:32 -0400, Terry Reedy wrote:

> On 8/30/2012 12:00 PM, Steven D'Aprano wrote:
>> On Thu, 30 Aug 2012 07:02:24 -0400, Roy Smith wrote:
[...]
>>> Is the implementation smart enough to know that x == y is always False
>>> if x and y are using different internal representations?
> 
> Yes, after checking lengths, and in same circumstances, x != y is True.
[snip C code]

Thanks Terry for looking that up.

> 'a in s' is also False if a chars are wider than s chars.

Now that's a nice optimization!

[...]
>> But x and y are not necessarily always False just because they have
>> different representations. There may be circumstances where two strings
>> have different internal representations even though their content is
>> the same, so it's an unsafe optimization to automatically treat them as
>> unequal.
> 
> I am sure that str objects are always in canonical form once visible to
> Python code. Note that unready (non-canonical) objects are rejected by
> the rich comparison function.

That's one thing that I'm unclear about -- under what circumstances will 
a string be in compact versus non-compact form? Reading between the 
lines, I guess that a lot of the complexity of the implementation only 
occurs while a string is being built. E.g. if you have Python code like 
this:

''.join(str(x) for x in something)  # a generator expression

Python can't tell how much space to allocate for the string -- it doesn't 
know either the overall length of the string or the width of the 
characters. So I presume that there is string builder code for dealing 
with that, and that it involves resizing blocks of memory.

But if you do this:

''.join([str(x) for x in something])  # a list comprehension

Python could scan the list first, find out the widest char, and allocate 
exactly the amount of space needed for the string. Even in Python 2, 
joining a list comp is much faster than joining a gen expression.



-- 
Steven


More information about the Python-list mailing list