Hi friends,
efficient string concatenation has been a topic in
2004.
Armin Rigo proposed a patch with the name of the subject,
more precisely:
[Patches] [ python-Patches-980695 ] efficient string
concatenation
on sourceforge.net, on 2004-06-28.
This patch was finally added to Python 2.4 on 2004-11-30.
Some people might remember the larger discussion if such a patch
should be
accepted at all, because it changes the programming style for many
of us
from "don't do that, stupid" to "well, you may do it in CPython",
which has quite
some impact on other implementations (is it fast on Jython, now?).
It changed for instance my programming and teaching style a lot, of
course!
But I think nobody but people heavily involved in PyPy expected
this:
Now, more than eight years after that patch appeared and made it
into 2.4,
PyPy (!) still does _not_ have it!
Obviously I was mislead by other optimizations, and the fact that
this patch was from a/the major author of PyPy who invented the
initial
patch for CPython. That this would be in PyPy as well sooner or
later was
without question for me. Wrong... ;-)
Yes, I agree that for PyPy it is much harder to implement without
the
refcounting trick, and probably even more difficult in case of the
JIT.
But nevertheless, I tried to find any reference to this missing
crucial optimization,
with no success after an hour (*).
And I guess many other people are stepping in the same trap.
So I can imagine that PyPy looses some of its speed in many
programs, because
Armin's great hack did not make it into PyPy, and this is not loudly
declared
somewhere. I believe the efficiency of string concatenation is
something
that people assume by default and add it to the vague CPython
compatibility
claim, if not explicitly told otherwise.
----
Some silly proof, using python 2.7.3 vs PyPy 1.9:
$ cat strconc.py
#!env python
from timeit import default_timer as timer
tim = timer()
s = ''
for i in xrange(100000):
s += 'X'
tim = timer() - tim
print 'time for {} concats = {:0.3f}'.format(len(s), tim)
$ python strconc.py
time for 100000 concats = 0.028
$ pypy strconc.py
time for 100000 concats = 0.804
Something is needed - a patch for PyPy or for the documentation I
guess.
This is not just some unoptimized function in some module, but it is
used
all over the place and became a very common pattern since
introduced.
How ironic that a foreseen problem occurs _now_, and _there_ :-)
cheers -- chris
(*)
http://pypy.readthedocs.org/en/latest/cpython_differences.html
http://pypy.org/compat.html
http://pypy.org/performance.html
--
Christian Tismer :^) <mailto:tismer@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/