[Python-Dev] Building strings with +=
Jeremy Hylton
jeremy at alum.mit.edu
Thu Jun 24 09:08:39 EDT 2004
On Mon, 2004-06-21 at 12:43, Raymond Hettinger wrote:
> Without expanding the size of the string structure, I think it is
> possible to implement something equivalent to the following proposal
> (which for clarity, does add two words to the structure). Roughly, the
> idea is:
I have written many applications that store millions of strings. I
don't know how keen I am on adding more eight more bytes of storage. An
eight-byte string currently consumes 28 bytes of storage; the proposal
would bump it up to 36 bytes.
> Add two PyObject pointers to the structure, component0 and component1,
> and initialize them to NULL.
>
> Alter string_concat(a, b) to return a new string object with:
> ob_sval = '\0'
> ob_size = len(a) + len(b)
> component0 = a // be sure to INCREF(a)
> component1 = b // be sure to INCREF(b)
It sounds like you're proposing a string implementation know as
"ropes." See:
http://www.sgi.com/tech/stl/ropeimpl.html
http://citeseer.ist.psu.edu/boehm95ropes.html
Are did I misunderstand?
I think there's some merit to the idea. My initial reaction is that the
data structure seems a bit complex. Strings are nice and simple.
Another potential problem is that a lot of code purports to understand
the internal representation of strings; that is, they use
PyString_AS_STRING.
It would be pretty easy to develop an alternative string implementation
and do some performance tests without integrating it into the core.
That would identify the gross characteristics. I assume most of the
strings used internally, e.g. variable and attribute names, would almost
always be simple strings and, thus, wouldn't be affected much by a
different implementation.
Jeremy
More information about the Python-Dev
mailing list