Speed of string += string

Sat Apr 12 20:46:29 EDT 2003

[Carl Banks]
> Surprisingly, += doesn't appear to be a major time waster for
> small (meaning about 10000 bytes) strings.  I'm guessing it's
> because the malloc my system (Debian Linux) is optimized for small
> memory allocations.

[Skip Montanaro]
> Actually, if you're using 2.2.n or later, it's Python's small object
> allocator (see Objects/obmalloc.c), which is providing most of
> the speedup.  I'm not sure if n is 0, 1, or 2.

obmalloc isn't enabled by default before Python 2.3, where it has given
dramatic speedups in some cases (where "dramatic" depends on the platform
and the case at hand; it's generally faster everywhere, but not always
dramatically so).

There was a ton of tuning and reworking to get it to work well with
new-style classes & new dict allocation strategies, to make it 100% binary
compatible at the C API level with older Python releases, and to plug a deep
security hole.  In all, that was too much work to backport to the 2.2 line.
Enabling the optional obmalloc before 2.3 can't be recommended, because "the
security hole" mentioned allowed writing pure Python code that could
scribble over the platform malloc's internal memory.  You had to know a lot
about obmalloc internals to do that on purpose, but you didn't have to know
anything about 'em to do it by accident (a "magic cookie" distinguished
obmalloc's memory from the platform malloc's, and before 2.3 obmalloc relied
on the unlikelihood of platform malloc memory containing the magic bit
pattern by accident).

WRT joining short strings, memcpy() is quite zippy on Pentium-class boxes,
and an O(N**2) algorithm can be quicker than an O(N) one across a large
range of small N if the latter's overhead is smaller than the former's.  For
example, if a simple algorithm takes N**2 units of time, and a fancy one
100*N units, the simple algorithm is quicker for N < 100, despite being
horrid for N much larger than that.

Another kind of common string catentation task is joining a small number of
strings, where some are fixed.  Like

    '(' + guts + ')'

versus
    ''.join(['(', guts, ')'])

versus
    '(%s)' % guts

The second and third ways are about twice as fast as the first as len(guts)
increases, but they've got more fixed overhead than the first way (for
example, the third way always has the fixed runtime overhead of parsing and
interpreting the intent of the format string).  Because of that, the first
way can actually be the quickest for all len(guts) an app is likely to see,
and especially in 2.3 (because obmalloc's fixed overhead for small-object
allocation in 2.3 is indeed small).

worrying-about-small-speed-differences-is-a-way-of-life-but-not-a-
    good-one-ly y'rs  - tim