[New-bugs-announce] [issue38373] List overallocation strategy

Serhiy Storchaka report at bugs.python.org
Fri Oct 4 18:43:43 EDT 2019


New submission from Serhiy Storchaka <storchaka+cpython at gmail.com>:

Currently list uses the following formula for allocating an array for items (if size != 0):

    allocated = size + (size >> 3) + (3 if size < 9 else 6)

If add items by 1 the growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...

I think this strategy can be improved.

1. There is no much sense in allocating the space for 25 items. The standard Python allocator returns addresses aligned to 16 bytes on 64-bit platform. It will allocate a space for 26 items. It is more efficient if the starting address is a multiple of some power of two, and perhaps larger than the pointer size.

So it may be better to allocate a multiple of two or four items. Few first sizes are multiple of four, so this is good multiplier. I suggest to use the following expression:

    allocated = (size + (size >> 3) + 6) & ~3

It adds bits AND, but removes conditional value.

2. It is common enough case if the list is created from a sequence of a known size and do not adds items anymore. Or if it is created by concatenating of few sequences. In such cases the list can overallocate a space which will never be used. For example:

>>> import sys
>>> sys.getsizeof([1, 2, 3])
80
>>> sys.getsizeof([*(1, 2, 3)])
104

List display allocates a space for exactly 3 items. But a list created from a tuple allocates a space for 6 items.

Other example. Let extend a list by 10 items at a time.

size allocated
 10     17
 20     28
 30     39
 40     51
 50     51
 60     73
 70     73
 80     96
 90     96
100    118

We need to reallocate an array after first four steps. 17 is too small for 20, 28 is too small for 30, 39 is too small for 40. So overallocating does not help, it just spends a space.

My idea, that if we adds several items at a time and need to reallocate an array, we check if the overallocated size is enough for adding the same amount of items next time. If it is not enough, we do not overallocate.

The final algorithm is:

    allocated = (size + (size >> 3) + 6) & ~3
    if size - old_size > allocated - size:
        allocated = (size + 3) & ~3

It will save a space if extend a list by many items few times.

----------
components: Interpreter Core
messages: 353976
nosy: rhettinger, serhiy.storchaka, tim.peters
priority: normal
severity: normal
status: open
title: List overallocation strategy
type: resource usage
versions: Python 3.9

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue38373>
_______________________________________


More information about the New-bugs-announce mailing list