Slow String Repeat (was: [Python-Dev] PyBuffer* vs. array.array())
Raymond Hettinger
python@rcn.com
Mon, 6 Jan 2003 02:49:50 -0500
----- Original Message -----
From: "Guido van Rossum" <guido@python.org>
To: "Christian Tismer" <tismer@tismer.com>
Cc: <python-dev@python.org>
Sent: Sunday, January 05, 2003 7:33 PM
Subject: Re: Slow String Repeat (was: [Python-Dev] PyBuffer* vs. array.array())
> > > >>> if 1:
> > > ... t = time.clock()
> > > ... for i in xrange(100):
> > > ... s = ' ' * 1000 * 1000
> > > ... print time.clock()-t
> > > ...
> > > 0.674784644417
> > > >>> if 1:
> > > ... t = time.clock()
> > > ... for i in xrange(100):
> > > ... s = ' ' * 1000000
> > > ... print time.clock()-t
> > > ...
> > > 6.28695295072
> > > >>>
> >
> > Analysis:
> > The central copying code in stringobject.c is the following
> > tight loop:
> >
> > for (i = 0; i < size; i += a->ob_size)
> > memcpy(op->ob_sval+i, a->ob_sval, (int) a->ob_size);
> >
> > For my example, this memcpy is started for every single
> > of the one million bytes. So the overhead of memcpy,
> > let is be a function call or a macro, will be executed
> > a million times.
> >
> > On the other hand, doing ' ' * 1000 * 1000 only
> > has to call memcpy 2000 times.
>
> Do I hear a challenge for Raymond H? :-)
>
> --Guido van Rossum (home page: http://www.python.org/~guido/)
This ought to do it:
i = 0;
if (size >= 1 ){ // copy in first one
memcpy(op->ob_sval, a->ob_sval, (int) a->ob_size);
i = (int) a->ob_size;
}
for ( ; i + i < size ; i <<= 1) // grow in doubles
memcpy(op->ob_sval+i, op->ob_sval, i);
if (i < size ) // fill remainder
memcpy(op->ob_sval+i, op->ob_sval, size-i);
Raymond Hettinger