Speeding up: s += "string"
cben at techunix.technion.ac.il
Tue Apr 15 13:38:47 CEST 2003
Lulu of the Lotus-Eaters wrote on 2003-04-15:
> The topic of the O() problems of the "s += 'string'" operation has been
> discussed in several recent threads. The aspect that seems not to have
> really come up is "why not FIX it?"
I'm +1 if it is feasible ;-).
> It seems to me that the behavior CAN be improved. If strings were to
> use a preallocation strategy--similar to what lists do--MANY string
> appends could avoid a memory copy on the original string. The intial
> value could stay where it was, and the extra characters could live in
> the already-available contiguous space. Obviously, the utilized length
> would need to be stored somewhere. Once in a while, of course, the
> preallocation would fill up; in that case you'd need to do a costly
> strcopy()... but this could be avoided MOST of the time, ISTM.
Don't forget that strings are immutable and you need the old object to
still represent the old string. I see two approaches:
1. Each string object points to a string buffer and has an end index
and multiple strings can share the same buffer. Then it makes
perfect sense to also have a start index. The real gain is from
allowing substrings (slicing) to be O(1)! The hard part is memory
management, so that when you retain only a small part of a big
string, the unneed memory could be reclaimed. Ironically, this
works well for substrings but not for concatenation because of::
s = "some long string"
print s + "a"
print s + "b"
(you can only grow the initial string for one of them). This might
not be a problem because this usage is rare.
2. Lazy concatenation: string + string returns a proxy object that
simply points to both; its materialized to a real string only when
needed (initially on any other string operation but it's easy to
implement things like iteration without requiring it). Similar
approaches could work for lists and dictionaries, BTW. For the
later I mean the non-commutative union of two dicts where one has
priority and the other is the fallback; this would be handy for
cases where consing to alists is used in LISP...
Beni Cherniavsky <cben at tx.technion.ac.il>
More information about the Python-list