[Python-ideas] Length hinting and preallocation for container types
Eli Bendersky
eliben at gmail.com
Tue Mar 5 18:15:37 CET 2013
On Tue, Mar 5, 2013 at 9:05 AM, Christian Heimes <christian at python.org>wrote:
> Hello,
>
> today I came across this slides
> https://speakerdeck.com/alex/why-python-ruby-and-javascript-are-slow by
> Alex Gaynor. The slides aren't some random rants on Python. Alex makes
> some valid points. Coincidentally he is a PyPy developer, too. ;)
>
> One of his assertions is about memory (re)allocation. C is faster in
> some cases because C code usually does fewer allocations and
> reallocations. Python has no API to easily reallocate a list with 100
> items. Code like
>
> lst = []
> for i in range(100):
> lst.append(i*i)
>
> has to resize the list multiple times. PyPy has a feature to create a
> preallocated list. https://bitbucket.org/pypy/pypy/commits/2ff5e3c765ef/
>
> Internally CPython already distinguishes between the length of object
> and the allocation size of an object for some types like list, dict and
> bytearray. For example PyListObject has `ob_size` for __len__ and
> `allocated` for the amount of available `ob_item` slots.
>
> I suggest that we add two new functions to container types like list,
> bytearray and dict. obj.__preallocate__(size) increases the internal
> buffer by size elements. obj.__shrink__() dwindles the internal buffer
> to len(obj) elements, maybe a bit more.
>
> A new context manager aids users with preallocation and shrinking:
>
> class LengthHint:
> def __init__(self, container, hint):
> self.container = container
> self.hint = hint
> self.instance = None
>
> def __enter__(self):
> self.instance = self.container()
> self.instance.__preallocate__(self.hint)
> return self.instance
>
> def __exit__(self, exc_type, exc_val, exc_tb):
> self.instance.__shrink__()
>
>
> with LengthHint(list, 200) as lst:
> # lst has 200 ob_item slots but len(lst) == 0
> for i in range(100):
> lst.append(i*i)
> # __exit__ shrinks ob_item to 100
>
The real problem is that this code is not idiomatic Python, especially if
you want it to be reasonably fast:
lst = []
for i in range(100):
lst.append(i*i)
Why not:
lst = [i*i for i in range(100)]
If the "append" pattern is complex, just "preallocate" like this:
lst = [0] * 100
And then fill it.
Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130305/23419766/attachment.html>
More information about the Python-ideas
mailing list