[Python-Dev] Building strings with +=

Jeremy Hylton jeremy at alum.mit.edu
Thu Jun 24 09:08:39 EDT 2004

On Mon, 2004-06-21 at 12:43, Raymond Hettinger wrote:
> Without expanding the size of the string structure, I think it is
> possible to implement something equivalent to the following proposal
> (which for clarity, does add two words to the structure).  Roughly, the
> idea is:

I have written many applications that store millions of strings.  I
don't know how keen I am on adding more eight more bytes of storage.  An
eight-byte string currently consumes 28 bytes of storage; the proposal
would bump it up to 36 bytes.
> Add two PyObject pointers to the structure, component0 and component1,
> and initialize them to NULL.
> Alter string_concat(a, b) to return a new string object with:
>     ob_sval = '\0'
>     ob_size = len(a) + len(b)
>     component0 = a         // be sure to INCREF(a)
>     component1 = b         // be sure to INCREF(b)

It sounds like you're proposing a string implementation know as
"ropes."  See:
Are did I misunderstand?

I think there's some merit to the idea.  My initial reaction is that the
data structure seems a bit complex.  Strings are nice and simple. 
Another potential problem is that a lot of code purports to understand
the internal representation of strings; that is, they use

It would be pretty easy to develop an alternative string implementation
and do some performance tests without integrating it into the core. 
That would identify the gross characteristics.  I assume most of the
strings used internally, e.g. variable and attribute names, would almost
always be simple strings and, thus, wouldn't be affected much by a
different implementation.


More information about the Python-Dev mailing list