Slow String Repeat (was: [Python-Dev] PyBuffer* vs. array.array())

Mon, 06 Jan 2003 15:42:01 +0100

Raymond Hettinger wrote:

[ " " * 1000000  ten times slower than  " " * 1000 * 1000 ]

> This ought to do it:
> 
> 
> i = 0;
> if (size >= 1 ){                // copy in first one
>         memcpy(op->ob_sval, a->ob_sval, (int) a->ob_size);
>         i = (int) a->ob_size;
> }
> for ( ; i + i < size ; i <<= 1) // grow in doubles
>         memcpy(op->ob_sval+i, op->ob_sval, i);
> if (i < size )                  // fill remainder
>         memcpy(op->ob_sval+i, op->ob_sval, size-i);

Looks good, not too much code to add.
This solves the problem for string repetition.

Theoretically, there are other objects which
might expose a similar problem: If the overhead
of starting the copy process is larger than
the actual copying process, then it is cheaper
to do the repeat operation stepwise.

I'm asking whether it makes sense to look for
every possible occurrence of such and fix it
like above?
I could imagine not to change string repeat and
others, but the abstract implementation of the
repetition of a sequence.
An algorithm like the above could be written
for general sequences, and do this break-up
on the abstract level, once and for all
repetitions of arbitrary objects.

But I have to admit that this is a bit less
efficient, since the target object cannto be
allocated in advance, before looking into the
actual type.

cheers -- chris

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/