[Tutor] a little loop

eryksun eryksun at gmail.com
Fri May 31 04:55:13 CEST 2013


On Thu, May 30, 2013 at 3:16 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>
> Sure enough, ''.join(list-of-substrings) is measurably faster than
> ''.join(iterator-of-substrings).

A tuple or list is used directly. Otherwise join() has to create an
iterator and build a new list.

This isn't directly related to the discussion on string concatenation.
But in relation to the offshoot discussion on lists, I've put together
some examples that demonstrate how different was of creating the same
list lead to different allocated sizes.

First, here's a function to get the allocated length of a list's item array:

    from ctypes import sizeof, c_void_p, c_ssize_t

    alloc_offset = sizeof(c_void_p * 2) + sizeof(c_ssize_t * 2)

    def allocated(alist):
        addr = id(alist)
        alloc = c_ssize_t.from_address(addr + alloc_offset)
        return alloc.value

It uses ctypes to peek into the object, but you can also use a list
object's __sizeof__() method to calculate the result. First get the
array's size in bytes by subtracting the size of an empty list from
the size of the list. Then divide by the number of bytes in a pointer:

    import struct

    pointer_size = struct.calcsize('P')
    empty_size = [].__sizeof__()

    def allocated(alist):
        size_bytes = alist.__sizeof__() - empty_size
        return size_bytes // pointer_size


Example 0:

    >>> allocated([0,1,2,3,4,5,6,7,8,9,10,11])
    12

In this case the constants are pushed on the stack, and the
interpreter evaluates BUILD_LIST(12), which in CPython calls
PyList_New(12). The same applies to using built-in range() in 2.x (not
3.x):

    >>> allocated(range(12))
    12

Example 1:

    >>> allocated([i for i in xrange(12)])
    16

This starts at 0 and grows as follows:

    1 + 1//8 + 3 = 4
    5 + 5//8 + 3 = 8
    9 + 9//8 + 6 = 16


Example 2:

    >>> allocated(list(xrange(12)))
    19

This also applies to range() in 3.x. Some iterators have a
"__length_hint__" method for guessing the initial size:

    >>> iter(xrange(12)).__length_hint__()
    12

The guess is immediately resized as follows:

    12 + 12//8 + 6 = 19


Example 3:

    >>> allocated(list(i for i in xrange(12)))
    12

The initializer here is a generator, compiled from the generator
expression. A generator doesn't have a length hint. Instead, the list
uses a default guess of 8, which is over-allocated as follows:

    8 + 8//8 + 3 = 12

If the generator continues to a 13th item, the list resizes to 13 +
13//8 + 6 = 20:

    >>> allocated(list(i for i in xrange(13)))
    20


More information about the Tutor mailing list