[Python-ideas] Length hinting and preallocation for container types
Christian Heimes
christian at python.org
Tue Mar 5 18:05:44 CET 2013
Hello,
today I came across this slides
https://speakerdeck.com/alex/why-python-ruby-and-javascript-are-slow by
Alex Gaynor. The slides aren't some random rants on Python. Alex makes
some valid points. Coincidentally he is a PyPy developer, too. ;)
One of his assertions is about memory (re)allocation. C is faster in
some cases because C code usually does fewer allocations and
reallocations. Python has no API to easily reallocate a list with 100
items. Code like
lst = []
for i in range(100):
lst.append(i*i)
has to resize the list multiple times. PyPy has a feature to create a
preallocated list. https://bitbucket.org/pypy/pypy/commits/2ff5e3c765ef/
Internally CPython already distinguishes between the length of object
and the allocation size of an object for some types like list, dict and
bytearray. For example PyListObject has `ob_size` for __len__ and
`allocated` for the amount of available `ob_item` slots.
I suggest that we add two new functions to container types like list,
bytearray and dict. obj.__preallocate__(size) increases the internal
buffer by size elements. obj.__shrink__() dwindles the internal buffer
to len(obj) elements, maybe a bit more.
A new context manager aids users with preallocation and shrinking:
class LengthHint:
def __init__(self, container, hint):
self.container = container
self.hint = hint
self.instance = None
def __enter__(self):
self.instance = self.container()
self.instance.__preallocate__(self.hint)
return self.instance
def __exit__(self, exc_type, exc_val, exc_tb):
self.instance.__shrink__()
with LengthHint(list, 200) as lst:
# lst has 200 ob_item slots but len(lst) == 0
for i in range(100):
lst.append(i*i)
# __exit__ shrinks ob_item to 100
The C implementation is trivial as the three types already have all
features.
Christian
More information about the Python-ideas
mailing list