I added a _PyUnicodeWriter internal API to optimize str%args and str.format(args). It uses a buffer which is overallocated, so it's basically like CPython str += str optimization. I still don't know how efficient it is on Windows, since realloc() is slow on Windows (at least on old Windows versions).
We should add an official and public API to concatenate strings. I know that PyPy has already its own API. Example:
writer = UnicodeWriter() for item in data: writer += item # i guess that it's faster than writer.append(item) return str(writer) # or writer.getvalue() ?
I don't care of the exact implementation of UnicodeWriter, it just have to be as fast or faster than ''.join(data).
I don't remember if _PyUnicodeWriter is faster than StringIO or slower. I created an issue for that: http://bugs.python.org/issue15612
2013/2/12 Maciej Fijalkowski firstname.lastname@example.org:
We recently encountered a performance issue in stdlib for pypy. It turned out that someone commited a performance "fix" that uses += for strings instead of "".join() that was there before.
Now this hurts pypy (we can mitigate it to some degree though) and possible Jython and IronPython too.
How people feel about generally not having += on long strings in stdlib (since the refcount = 1 thing is a hack)?
What about other performance improvements in stdlib that are problematic for pypy or others?
Personally I would like cleaner code in stdlib vs speeding up CPython. Typically that also helps pypy so I'm not unbiased.
Cheers, fijal _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com