[Tutor] a little loop

Thu May 30 22:35:18 CEST 2013

On Wed, May 29, 2013 at 12:51 PM, boB Stepp <robertvstepp at gmail.com> wrote:
> On Wed, May 29, 2013 at 11:07 AM, Oscar Benjamin
> <oscar.j.benjamin at gmail.com> wrote:
>
>> I don't know exactly how str.join is implemented but it does not use
>> this quadratic algorithm. For example if str.join would first compute
>> the length of the resulting string first then it can allocate memory
>> for exactly one string of that length and copy each substring to the
>> appropriate place (actually I imagine it uses an exponentially
>> resizing buffer but this isn't important).
>
> ...str.join gets around these issues? In the linked article it was
> discussing increasing memory allocation by powers of two instead of
> trying to determine the exact length of the strings involved,
> mentioning that the maximum wasted memory would be 50% of what was
> actually needed. Is Python more clever in its implementation?

CPython computes the exact length required. Nothing clever. It first
expands the iterable into a list. It joins the strings in two passes.
In the first pass it computes the total size (3.3 also has to
determine the 'kind' of unicode string in this loop, i.e. ASCII,
2-byte, etc). Then it allocates a new string and copies in the data in
a second pass.

>> * CPython actually has an optimisation that can append strings in
>> precisely this situation. However it is an implementation detail of
>> CPython that may change and it does not work in other interpreters
>> e.g. Jython. Using this kind of code can damage portability since your
>> program may run fine in CPython but fail in other interpreters.
>>
> You are speaking of "appending" and not "concatenation" here?

In terms of sequence methods, it's inplace concatenation. On their
own, immutable string types only support regular concatenation, but
the interpreter can evaluate the concatenation inplace for special
cases. Specifically, it can resize the target string in an INPLACE_ADD
if it's not interned and has only *one* reference. Also, the reference
has to be a local variable; it can't be a global (unless at module
level), an attribute, or a subscript.

Here's an example (tested in 2.7 and 3.3).

Interned strings:

    >>> s = 'abcdefgh' * 128

CPython code objects intern their string constants that are all name
characters (ASCII alphanumeric and underscore). But that's not an
issue if you multiply the string to make it longer than 20 characters.
A sequence length of 20 is the cutoff point for compile-time constant
folding. This keeps the code object size under wraps. The number 20
was apparently chosen for obvious reasons (at least to someone).
Anyway, if the string is determined at runtime, it won't be interned.

But really I'm concatenating the base string with itself so many times
to avoid using the Pymalloc object allocator (see the note below). Its
block sizes are fine grained at just 8 bytes apart. Depending on your
system I don't know if adding even one more byte will push you up to
the next block size, which would defeat an example based on object
id(). I'll take my chances that the stdlib realloc() will be able to
grow the block, but that's not guaranteed either. Strings should be
treated as immutable at all times. This is just a performance
optimization.

The reference count must be 1:

    >>> sys.getrefcount(s)
    2

Hmmm. The reference count of the string is incremented when it's
loaded on the stack, meaning it will always be at least 2. As such,
the original variable reference is deleted before in-place
concatenation. By that I mean that if you have s += 'spam', then mid
operation s is deleted from the current namespace. The next
instruction stores the result back.

Voilà:

    >>> id_s = id(s)
    >>> s += 'spam'
    >>> id(s) == id_s
    True

Note on object reallocation:
The following assumes CPython is built with the Pymalloc small-object
allocator, which is the default configuration in 2.3+. Pymalloc
requests memory from the system in 256 KiB chunks calls arenas. Each
arena is partitioned into 64 pools. Each pool has a fixed block size,
and block sizes increase in steps of 8, from 8 bytes up to 256 bytes
(up to 512 bytes in 3.3).

Resizing the string eventually calls PyObject_Realloc. If the object
isn't managed by Pymalloc, the call to PyObject_Realloc punts to the C
stdlib realloc. Otherwise if the new size maps to a larger block size,
or if it's shrinking by more than 25% to a smaller block size, the
allocation punts to PyObject_Malloc. This allocates a block from the
first available pool. If the requested size is larger than the maximum
block size, it punts to the C stdlib malloc.