[Python-Dev] Building strings with +=

Jeremy Hylton jeremy at alum.mit.edu
Thu Jun 24 09:08:39 EDT 2004


On Mon, 2004-06-21 at 12:43, Raymond Hettinger wrote:
> Without expanding the size of the string structure, I think it is
> possible to implement something equivalent to the following proposal
> (which for clarity, does add two words to the structure).  Roughly, the
> idea is:

I have written many applications that store millions of strings.  I
don't know how keen I am on adding more eight more bytes of storage.  An
eight-byte string currently consumes 28 bytes of storage; the proposal
would bump it up to 36 bytes.
 
> Add two PyObject pointers to the structure, component0 and component1,
> and initialize them to NULL.
> 
> Alter string_concat(a, b) to return a new string object with:
>     ob_sval = '\0'
>     ob_size = len(a) + len(b)
>     component0 = a         // be sure to INCREF(a)
>     component1 = b         // be sure to INCREF(b)

It sounds like you're proposing a string implementation know as
"ropes."  See:
    http://www.sgi.com/tech/stl/ropeimpl.html
    http://citeseer.ist.psu.edu/boehm95ropes.html
Are did I misunderstand?

I think there's some merit to the idea.  My initial reaction is that the
data structure seems a bit complex.  Strings are nice and simple. 
Another potential problem is that a lot of code purports to understand
the internal representation of strings; that is, they use
PyString_AS_STRING.

It would be pretty easy to develop an alternative string implementation
and do some performance tests without integrating it into the core. 
That would identify the gross characteristics.  I assume most of the
strings used internally, e.g. variable and attribute names, would almost
always be simple strings and, thus, wouldn't be affected much by a
different implementation.

Jeremy





More information about the Python-Dev mailing list