[Python-Dev] Wild Idea for the Year

Tim Peters tim.one at comcast.net
Thu Jun 24 20:00:21 EDT 2004


[Raymond Hettinger, on lazy string cat]
> ...
> With little effort on the C side, there is an opportunity to be the
> first dynamic language with O(n) behavior for serial string
> concatenations -- one less thing to teach, one step towards scalability.

Perl has mutable strings, and appending is done in-place.  I don't know
whether they do all the memory mgmt tricks needed to guarantee worst-case
linear behavior, but in practice stuff like this doesn't visibly slow down
as time goes on (while it slows down dramatically in Python):

$a = 'x';
while (1) {
     $a .= 'x';
     if (length($a) % 1000 == 0) {
         print length($a), "\n";
     }
}

Icon has immutable strings, but append-a-bunch-of-stuff-to-a-new-string is
*often* linear-time there too.  That's an implementation trick, relying on a
compacting garbage collector so that new allocations are done just by
incrementing the high-water mark at the end of the current free area.  In
practice it often turns out to be the case that catenating strings a & b
into a finds a and b to be the "last two" memory regions allocated, and then
the total cost of catenation is building a new (fixed-size) string
descriptor, spanning the combined regions.  So a compacting garbage
collector, the internal use of string descriptors, and *not* sticking a NUL
byte on the end of strings all conspire to make this possible.

Java uses mutable string buffers under the covers, so avoids quadratic-time
behavior too.  The compiler conspires to make that happen.

Python isn't unique in having a problem to solve here, but it's not the
first to think of it either.

>> But wait, I think it won't fly at all unless you make ob_sval a pointer
>> to separately allocated memory.  Otherwise, how could _autojoin()
>> possibly "fix" the string without allocating the memory for it?

> PyString_Resize().

There isn't such a thing.  The function you're thinking of is
_PyString_Resize(), and it's in the private API for good reason:  it's
dangerous, *because* it can need to move the string object.  Note that the
implementation aborts unless the refcount on the string object is 1.  It's
intended to be used internally, and with no possibility of Python-level code
"seeing" a string object between the time it's created and the time
_PyString_Resize() returns.

> ...
> That being said, I didn't miss that you hate the idea, so I'll craft a
> recipe and drop it :-(

Because of the prohibition against relocating visible Python objects, this
isn't so easy to worm around in Python.  "".join(strings) is just a rite of
Python passage.





More information about the Python-Dev mailing list