[Python-Dev] Usage of += on strings in loops in stdlib

Victor Stinner victor.stinner at gmail.com
Wed Feb 13 09:02:07 CET 2013


I added a _PyUnicodeWriter internal API to optimize str%args and
str.format(args). It uses a buffer which is overallocated, so it's
basically like CPython str += str optimization. I still don't know how
efficient it is on Windows, since realloc() is slow on Windows (at
least on old Windows versions).

We should add an official and public API to concatenate strings. I
know that PyPy has already its own API. Example:

writer = UnicodeWriter()
for item in data:
    writer += item   # i guess that it's faster than writer.append(item)
return str(writer) # or writer.getvalue() ?

I don't care of the exact implementation of UnicodeWriter, it just
have to be as fast or faster than ''.join(data).

I don't remember if _PyUnicodeWriter is faster than StringIO or
slower. I created an issue for that:
http://bugs.python.org/issue15612

Victor

2013/2/12 Maciej Fijalkowski <fijall at gmail.com>:
> Hi
>
> We recently encountered a performance issue in stdlib for pypy. It
> turned out that someone commited a performance "fix" that uses += for
> strings instead of "".join() that was there before.
>
> Now this hurts pypy (we can mitigate it to some degree though) and
> possible Jython and IronPython too.
>
> How people feel about generally not having += on long strings in
> stdlib (since the refcount = 1 thing is a hack)?
>
> What about other performance improvements in stdlib that are
> problematic for pypy or others?
>
> Personally I would like cleaner code in stdlib vs speeding up CPython.
> Typically that also helps pypy so I'm not unbiased.
>
> Cheers,
> fijal
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com


More information about the Python-Dev mailing list