unicode speed

jepler at unpythonic.net jepler at unpythonic.net
Tue Nov 29 23:05:43 CET 2005

On Tue, Nov 29, 2005 at 09:48:15AM +0100, David Siroky wrote:
> Hi!
> I need to enlighten myself in Python unicode speed and implementation.
> My platform is AMD Athlon at 1300 (x86-32), Debian, Python 2.4.
> First a simple example (and time results):
> x = "a"*50000000
> real    0m0.195s
> user    0m0.144s
> sys     0m0.046s
> x = u"a"*50000000
> real    0m2.477s
> user    0m2.119s
> sys     0m0.225s
> So my first question is why creation of a unicode string lasts more then 10x
> longer than non-unicode string?

string objects have the optimization described in the log message below.
The same optimization hasn't been made to unicode_repeat, though it would
probably also benefit from it.

r30616 | rhettinger | 2003-01-06 04:33:56 -0600 (Mon, 06 Jan 2003) | 11 lines

Optimize string_repeat.

Christian Tismer pointed out the high cost of the loop overhead and
function call overhead for 'c' * n where n is large.  Accordingly,
the new code only makes lg2(n) loops.

Interestingly, 'c' * 1000 * 1000 ran a bit faster with old code.  At some
point, the loop and function call overhead became cheaper than invalidating
the cache with lengthy memcpys.  But for more typical sizes of n, the new
code runs much faster and for larger values of n it runs only a bit slower.

If you're a "C" coder too, consider creating and submitting a patch to do this
to the patch tracker on http://sf.net/projects/python .  That's the best thing
you can do to ensure the optimization is considered for a future release of

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20051129/224bc315/attachment.pgp>

More information about the Python-list mailing list