Speeding up: s += "string"
logistix at cathoderaymission.net
logistix at cathoderaymission.net
Tue Apr 15 13:53:28 EDT 2003
Lulu of the Lotus-Eaters <mertz at gnosis.cx> wrote in message news:<mailman.1050393203.4704.python-list at python.org>...
>
> It seems to me that the behavior CAN be improved. If strings were to
> use a preallocation strategy--similar to what lists do--MANY string
> appends could avoid a memory copy on the original string. The intial
> value could stay where it was, and the extra characters could live in
> the already-available contiguous space. Obviously, the utilized length
> would need to be stored somewhere. Once in a while, of course, the
> preallocation would fill up; in that case you'd need to do a costly
> strcopy()... but this could be avoided MOST of the time, ISTM.
>
This is the way the current list optimization works. It's based on
(generally valid) assumption that re'malloc has three cases:
You're re'mallocing to a smaller size, shrink allocation
You're re'mallocing to the same size, do nothing
YOu're re'mallocing to a bigger size, grab new memory and do a
memcpy()
The current optimization for list adjusts the malloc'ed memory based
on the size current size. Even for something like a 512Meg list, it
only uses a maximum of 16 Meg for padding.
The real problem is the fact that strings are immutable. From the
bytecode perspective, I don't think it's possible to reliably assume
that if string has a refcount = x in a '+' operation that it's not a
shared reference (where you couldn't optimize)
I submitted a patch to use the same allocation technique for
arraymodule, and it decreased my test case of 8 million appends from
12 hours to 4-5 seconds. Needless to say, the patch was accepted and
will be 2.3 ;)
Someone else had a feature request to allow strings to be .appended()
to array.array("c") objects, but there's no patch yet. If this were
implemented, array.array("c") would pretty much become a
"mutableString" class that'd be easy to use and run at reasonable
speeds. [I was thinking about writing a patch, and then started
obsessing about what sort of unicode coercions should take place and
my brain froze up]
More information about the Python-list
mailing list