[Python-ideas] Length hinting and preallocation for container types

Christian Heimes christian at python.org
Tue Mar 5 18:05:44 CET 2013


Hello,

today I came across this slides
https://speakerdeck.com/alex/why-python-ruby-and-javascript-are-slow by
Alex Gaynor. The slides aren't some random rants on Python. Alex makes
some valid points. Coincidentally  he is a PyPy developer, too. ;)

One of his assertions is about memory (re)allocation. C is faster in
some cases because C code usually does fewer allocations and
reallocations. Python has no API to easily reallocate a list with 100
items. Code like

  lst = []
  for i in range(100):
      lst.append(i*i)

has to resize the list multiple times. PyPy has a feature to create a
preallocated list. https://bitbucket.org/pypy/pypy/commits/2ff5e3c765ef/

Internally CPython already distinguishes between the length of object
and the allocation size of an object for some types like list, dict and
bytearray. For example PyListObject has `ob_size` for __len__ and
`allocated` for the amount of available `ob_item` slots.

I suggest that we add two new functions to container types like list,
bytearray and dict. obj.__preallocate__(size) increases the internal
buffer by size elements. obj.__shrink__() dwindles the internal buffer
to len(obj) elements, maybe a bit more.

A new context manager aids users with preallocation and shrinking:

class LengthHint:
    def __init__(self, container, hint):
        self.container = container
        self.hint = hint
        self.instance = None

    def __enter__(self):
        self.instance = self.container()
        self.instance.__preallocate__(self.hint)
        return self.instance

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.instance.__shrink__()


with LengthHint(list, 200) as lst:
    # lst has 200 ob_item slots but len(lst) == 0
    for i in range(100):
        lst.append(i*i)
# __exit__ shrinks ob_item to 100


The C implementation is trivial as the three types already have all
features.

Christian



More information about the Python-ideas mailing list