
What is wrong with the following code? def join(*iterable): s = "" for e in iterable: s = s + e return s What if that code could run as fast as ''.join(iterable)? How much existing code would start to perform better? Would Python be easier to teach and write? Would the resulting code look more natural and obvious? Without expanding the size of the string structure, I think it is possible to implement something equivalent to the following proposal (which for clarity, does add two words to the structure). Roughly, the idea is: Add two PyObject pointers to the structure, component0 and component1, and initialize them to NULL. Alter string_concat(a, b) to return a new string object with: ob_sval = '\0' ob_size = len(a) + len(b) component0 = a // be sure to INCREF(a) component1 = b // be sure to INCREF(b) To the start of every string method except string_len() and string_concat(), add a macro that expands to: if (self->component0 != NULL) _autojoin(self); Write an _autojoin(self) method who's goal in life is to recursively fill-out the string from the components: * resize self to be big enough for an ob_sized string * treat component0 and component1 as a binary tree (with only the leaves holding real strings) and recursively (in-order) find the leaves and copy their string data into self->ob_sval. DECREF each component after traversing it. * set self->component0 to NULL to mark self as being fully joined. The recursion depth should not be worrisome. In the typical case, one side of the tree will be a leaf and will not recurse further. It gets more interesting if the user writes things like b=a+a or has a nested series like c=a+b; f=d+e; g=c+f. The recursion and reference counting trivializes the handling of those cases. An actual implementation striving to save the two additional structure fields could use the ob_sstate field to indicate a deferred string (my name for one that has not yet had ob_sval filled out). The ob_sval array itself would be given room to hold two object pointers for the components and access to those pointers would be cast to (PyObject *). If this works out, it would be an innovation. The join([a,b]) versus a+=b problem also exists in several other languages. AFAICT, Python would have been the first to solve it. As a nice side benefit, __builtin__.sum() would no longer need to reject the case where the inputs were strings. Raymond Hettinger

What is wrong with the following code?
def join(*iterable): s = "" for e in iterable: s = s + e return s
What if that code could run as fast as ''.join(iterable)? How much existing code would start to perform better? Would Python be easier to teach and write? Would the resulting code look more natural and obvious?
Without expanding the size of the string structure, I think it is possible to implement something equivalent to the following proposal (which for clarity, does add two words to the structure). Roughly, the idea is:
Add two PyObject pointers to the structure, component0 and component1, and initialize them to NULL.
Alter string_concat(a, b) to return a new string object with: ob_sval = '\0' ob_size = len(a) + len(b) component0 = a // be sure to INCREF(a) component1 = b // be sure to INCREF(b)
To the start of every string method except string_len() and string_concat(), add a macro that expands to:
if (self->component0 != NULL) _autojoin(self);
Write an _autojoin(self) method who's goal in life is to recursively fill-out the string from the components:
* resize self to be big enough for an ob_sized string * treat component0 and component1 as a binary tree (with only the leaves holding real strings) and recursively (in-order) find the leaves and copy their string data into self->ob_sval. DECREF each component after traversing it. * set self->component0 to NULL to mark self as being fully joined.
The recursion depth should not be worrisome. In the typical case, one side of the tree will be a leaf and will not recurse further. It gets more interesting if the user writes things like b=a+a or has a nested series like c=a+b; f=d+e; g=c+f. The recursion and reference counting trivializes the handling of those cases.
An actual implementation striving to save the two additional structure fields could use the ob_sstate field to indicate a deferred string (my name for one that has not yet had ob_sval filled out). The ob_sval array itself would be given room to hold two object pointers for the components and access to those pointers would be cast to (PyObject *).
If this works out, it would be an innovation. The join([a,b]) versus a+=b problem also exists in several other languages. AFAICT, Python would have been the first to solve it.
As a nice side benefit, __builtin__.sum() would no longer need to reject the case where the inputs were strings.
It goes flat against one of my *original* design goals for strings, which was to be really simple, obviously correct code that was also very fast for the common case. Your idea adds some extra (constant) time to all string ops, and quite a bit of complexity. There are lots of places where knowledge of string internals is assumed, including 3rd party code using a few macros, all of which would have to be fixed. But wait, I think it won't fly at all unless you make ob_sval a pointer to separately allocated memory. Otherwise, how could _autojoin() possibly "fix" the string without allocating the memory for it? I am sure that making string objects contain a pointer to the actual string data instead of having it contiguously allocated with the header will slow things down. --Guido van Rossum (home page: http://www.python.org/~guido/)

Your idea adds some extra (constant) time to all string ops, and quite a bit of complexity.
It adds a single if NULL check to each op (about the same cost as PyString_Check). Complexity (recursive joining) is confined to a single function. Meanwhile, the code for str.__add__() becomes simpler. That's basically the whole show.
There are lots of places where knowledge of string internals is assumed, including 3rd party code using a few macros, all of which would have to be fixed.
Then save it for Py3.0, or not. The idea is to make things easier for the python programmer, beginner or pro. With little effort on the C side, there is an opportunity to be the first dynamic language with O(n) behavior for serial string concatenations -- one less thing to teach, one step towards scalability.
But wait, I think it won't fly at all unless you make ob_sval a pointer to separately allocated memory. Otherwise, how could _autojoin() possibly "fix" the string without allocating the memory for it?
PyString_Resize(). BTW, there is a proof-of-concept demo patch with UserString at: www.python.org/sf/976162 Also, there is an alternative approach of having str.__add__() return a string proxy. This would avoid issues with 3rd party code. That being said, I didn't miss that you hate the idea, so I'll craft a recipe and drop it :-( Raymond

But wait, I think it won't fly at all unless you make ob_sval a pointer to separately allocated memory. Otherwise, how could _autojoin() possibly "fix" the string without allocating the memory for it?
PyString_Resize().
This moves the string in memory, and therefore only works when there is exactly one reference to the object (the one you pass in). Otherwise you'd have to find and patch up all references to the object.
BTW, there is a proof-of-concept demo patch with UserString at: www.python.org/sf/976162
That's using a Python class, which doesn't have the resizing problem because it is implemented using a reference to the actual string data.
Also, there is an alternative approach of having str.__add__() return a string proxy. This would avoid issues with 3rd party code.
That being said, I didn't miss that you hate the idea, so I'll craft a recipe and drop it :-(
Hate is too strong, but I think it is unnecessary encroachment. I worry about Python losing its KISS principle. Python 2.2 has been ported to the Nokia Series 60 platform, which has something like 8-16 MB of RAM available. I'm sure the footprint growth of Python 2.3 and 2.4 gives those developers nightmares already when they are thinking of tracking later versions... --Guido van Rossum (home page: http://www.python.org/~guido/)

[Raymond Hettinger, on lazy string cat]
... With little effort on the C side, there is an opportunity to be the first dynamic language with O(n) behavior for serial string concatenations -- one less thing to teach, one step towards scalability.
Perl has mutable strings, and appending is done in-place. I don't know whether they do all the memory mgmt tricks needed to guarantee worst-case linear behavior, but in practice stuff like this doesn't visibly slow down as time goes on (while it slows down dramatically in Python): $a = 'x'; while (1) { $a .= 'x'; if (length($a) % 1000 == 0) { print length($a), "\n"; } } Icon has immutable strings, but append-a-bunch-of-stuff-to-a-new-string is *often* linear-time there too. That's an implementation trick, relying on a compacting garbage collector so that new allocations are done just by incrementing the high-water mark at the end of the current free area. In practice it often turns out to be the case that catenating strings a & b into a finds a and b to be the "last two" memory regions allocated, and then the total cost of catenation is building a new (fixed-size) string descriptor, spanning the combined regions. So a compacting garbage collector, the internal use of string descriptors, and *not* sticking a NUL byte on the end of strings all conspire to make this possible. Java uses mutable string buffers under the covers, so avoids quadratic-time behavior too. The compiler conspires to make that happen. Python isn't unique in having a problem to solve here, but it's not the first to think of it either.
But wait, I think it won't fly at all unless you make ob_sval a pointer to separately allocated memory. Otherwise, how could _autojoin() possibly "fix" the string without allocating the memory for it?
PyString_Resize().
There isn't such a thing. The function you're thinking of is _PyString_Resize(), and it's in the private API for good reason: it's dangerous, *because* it can need to move the string object. Note that the implementation aborts unless the refcount on the string object is 1. It's intended to be used internally, and with no possibility of Python-level code "seeing" a string object between the time it's created and the time _PyString_Resize() returns.
... That being said, I didn't miss that you hate the idea, so I'll craft a recipe and drop it :-(
Because of the prohibition against relocating visible Python objects, this isn't so easy to worm around in Python. "".join(strings) is just a rite of Python passage.

What is wrong with the following code?
def join(*iterable): s = "" for e in iterable: s = s + e return s
[snip implementation specific description] That would be terribly nifty, but it begs the question: If we do this for strings (an immutable sequence type), do we do this for tuples (another immutable sequence type)? Likely not on the tuple side, considering Python's reliance on tuples in virtually every (is it every?) function call, additional overhead is additional overhead, and rarely do I see people doing tuple concatenation. I do like the possibility that Python can be the first language to solve the repeated string concatenation problem. It would also reduce the need to tell newbies in c.l.py "don't do string concatenation, use ''.join(lst)". - Josiah

Josiah> That would be terribly nifty, but it begs the question: Josiah> If we do this for strings (an immutable sequence type), do we do Josiah> this for tuples (another immutable sequence type)? No. Concatenating tuples occurs rarely. Concatenating strings occurs all the time. Skip

Hello Raymond, On Sat, Jun 19, 2004 at 03:40:07PM -0400, Raymond Hettinger wrote:
* resize self to be big enough for an ob_sized string
How would you do that? I can't see how you can resize strings in-place, and if the strings are pre-allocated to have the correct length you only solve one side of the problem -- the copy overhead -- and not the other one -- allocating and deallocating larger and larger blocks of memory. Solving half the problem would already be nice. Note however that code everywhere expects strings to have characters in their ob_sval field, accessing it thought the PyString_AS_STRING() macro. You could fix that macro too, but you would have to carefully monitor the performance impact.
If this works out, it would be an innovation. The join([a,b]) versus a+=b problem also exists in several other languages. AFAICT, Python would have been the first to solve it.
*cough* Psyco *cough* implementation versus language *cough* I don't think Python can pretend to have put a lot of research effort into its string objects. Someone mentioned C++'s ropes, for example. Moreover there are a number of papers out there about which kind of structures are best suited in which situations, which should probably be taken into account too. (Psyco uses over-allocated buffers.) Armin

Hello, On Sat, Jun 19, 2004 at 03:40:07PM -0400, Raymond Hettinger wrote:
s = "" for e in iterable: s = s + e
Hum, it may not be impossible after all. http://www.python.org/sf/980695 -> an idea that seems to work and only involves 3 pages of obfuscated code in new well-isolated functions of ceval.c. stringobject.c is unmodified. Armin

At 02:38 PM 6/27/04 +0100, Armin Rigo wrote:
Hello,
On Sat, Jun 19, 2004 at 03:40:07PM -0400, Raymond Hettinger wrote:
s = "" for e in iterable: s = s + e
Hum, it may not be impossible after all.
http://www.python.org/sf/980695 -> an idea that seems to work and only involves 3 pages of obfuscated code in new well-isolated functions of ceval.c. stringobject.c is unmodified.
Looks interesting. It appears the cost is added to cases that are likely to have longer execution times anyway. (i.e. the slow_add and slow_iadd cases.) I notice that string_concatenate() doesn't cover the STORE_DEREF case, however. Was that intentional? Last question: is this actually faster when 'e' is replaced by an expression that causes memory allocation? That is, isn't this *still* going to be an O(n**2) operation if the string has to be relocated on each addition?

[Phillip J. Eby, on s = "" for e in iterable: s = s + e and http://www.python.org/sf/980695 ] ...
Last question: is this actually faster when 'e' is replaced by an expression that causes memory allocation? That is, isn't this *still* going to be an O(n**2) operation if the string has to be relocated on each addition?
It can't guarantee worst-case linear time, regardless of whether e causes memory allocation. That's because it relies on the platform realloc() to get more memory (after the string gets too big for pymalloc), so its behavior depends on what the platform realloc() does. If the string gets "big enough", a number of realloc() implementations will move the string "to the end" of memory, and grow it thereafter via sbrk() calls. If e causes memory allocation too, it's likely to get satisfied by free blocks "before the end" of memory (for example, from the big block released at the time the string first got moved to the end). But there's no consistency across platforms in how the native realloc() behaves. BTW, so long as the string object is small enough to be handled by pymalloc, it guarantees *to* copy the entire string object for each 8 characters added (pymalloc can never extend in-place when an 8-byte boundary is crossed).

Hello Phillip, On Sun, Jun 27, 2004 at 06:03:06PM -0400, Phillip J. Eby wrote:
Looks interesting. It appears the cost is added to cases that are likely to have longer execution times anyway. (i.e. the slow_add and slow_iadd cases.)
Yes. I didn't time it, but I expect that it a lot of applications would see a small increase in performance just because special-casing + even between small strings is faster than doing all this PyNumber_Add() madness.
I notice that string_concatenate() doesn't cover the STORE_DEREF case, however. Was that intentional?
Ops, thanks. I overlooked this one.
Last question: is this actually faster when 'e' is replaced by an expression that causes memory allocation? That is, isn't this *still* going to be an O(n**2) operation if the string has to be relocated on each addition?
Apparently not, according to a few tests. I suspect the memory allocator does some kind of over-allocation automatically, though I'm not sure how much Python's and Linux's allocators each contribute to this result. However, I noticed that it doesn't work for expressions like s=s+t+u, because it is associated as s=(s+t)+u. s+=t+u works, though. Armin
participants (9)
-
Armin Rigo
-
Guido van Rossum
-
Josiah Carlson
-
Phillip J. Eby
-
Raymond Hettinger
-
Raymond Hettinger
-
Skip Montanaro
-
Tim Peters
-
Tim Peters