Speeding up: s += "string"

Beni Cherniavsky cben at techunix.technion.ac.il
Tue Apr 15 07:38:47 EDT 2003


Lulu of the Lotus-Eaters wrote on 2003-04-15:

> The topic of the O() problems of the "s += 'string'" operation has been
> discussed in several recent threads.  The aspect that seems not to have
> really come up is "why not FIX it?"
>
I'm +1 if it is feasible ;-).

> It seems to me that the behavior CAN be improved.  If strings were to
> use a preallocation strategy--similar to what lists do--MANY string
> appends could avoid a memory copy on the original string.  The intial
> value could stay where it was, and the extra characters could live in
> the already-available contiguous space.  Obviously, the utilized length
> would need to be stored somewhere.  Once in a while, of course, the
> preallocation would fill up; in that case you'd need to do a costly
> strcopy()... but this could be avoided MOST of the time, ISTM.
>
Don't forget that strings are immutable and you need the old object to
still represent the old string.  I see two approaches:

1. Each string object points to a string buffer and has an end index
   and multiple strings can share the same buffer.  Then it makes
   perfect sense to also have a start index.  The real gain is from
   allowing substrings (slicing) to be O(1)!  The hard part is memory
   management, so that when you retain only a small part of a big
   string, the unneed memory could be reclaimed.  Ironically, this
   works well for substrings but not for concatenation because of::

     s = "some long string"
     print s + "a"
     print s + "b"

   (you can only grow the initial string for one of them).  This might
   not be a problem because this usage is rare.

2. Lazy concatenation: string + string returns a proxy object that
   simply points to both; its materialized to a real string only when
   needed (initially on any other string operation but it's easy to
   implement things like iteration without requiring it).  Similar
   approaches could work for lists and dictionaries, BTW.  For the
   later I mean the non-commutative union of two dicts where one has
   priority and the other is the fallback; this would be handy for
   cases where consing to alists is used in LISP...

-- 
Beni Cherniavsky <cben at tx.technion.ac.il>





More information about the Python-list mailing list