What is wrong with the following code?
def join(*iterable): s = "" for e in iterable: s = s + e return s
What if that code could run as fast as ''.join(iterable)? How much existing code would start to perform better? Would Python be easier to teach and write? Would the resulting code look more natural and obvious?
Without expanding the size of the string structure, I think it is possible to implement something equivalent to the following proposal (which for clarity, does add two words to the structure). Roughly, the idea is:
Add two PyObject pointers to the structure, component0 and component1, and initialize them to NULL.
Alter string_concat(a, b) to return a new string object with: ob_sval = '\0' ob_size = len(a) + len(b) component0 = a // be sure to INCREF(a) component1 = b // be sure to INCREF(b)
To the start of every string method except string_len() and string_concat(), add a macro that expands to:
if (self->component0 != NULL) _autojoin(self);
Write an _autojoin(self) method who's goal in life is to recursively fill-out the string from the components:
* resize self to be big enough for an ob_sized string * treat component0 and component1 as a binary tree (with only the leaves holding real strings) and recursively (in-order) find the leaves and copy their string data into self->ob_sval. DECREF each component after traversing it. * set self->component0 to NULL to mark self as being fully joined.
The recursion depth should not be worrisome. In the typical case, one side of the tree will be a leaf and will not recurse further. It gets more interesting if the user writes things like b=a+a or has a nested series like c=a+b; f=d+e; g=c+f. The recursion and reference counting trivializes the handling of those cases.
An actual implementation striving to save the two additional structure fields could use the ob_sstate field to indicate a deferred string (my name for one that has not yet had ob_sval filled out). The ob_sval array itself would be given room to hold two object pointers for the components and access to those pointers would be cast to (PyObject *).
If this works out, it would be an innovation. The join([a,b]) versus a+=b problem also exists in several other languages. AFAICT, Python would have been the first to solve it.
As a nice side benefit, __builtin__.sum() would no longer need to reject the case where the inputs were strings.