Most efficient way to "pre-grow" a list?

gil_johnson gil_johnson at earthlink.net
Sun Nov 8 13:45:20 CET 2009


On Nov 6, 8:46 pm, gil_johnson <gil_john... at earthlink.net> wrote:

> >>> arr[0] = initializer
> >>> for i in range N:
> >>>      arr.extend(arr)
>
> This doubles the array every time through the loop, and you can add
> the powers of 2 to get the desired result.
> Gil

To all who asked about my previous post,

I'm sorry I posted such a cryptic note before, I should have waited
until I
had the code available.

I am still new to Python, and I couldn't find a way to create an array
of
size N, with each member initialized to a given value. If there is
one,
please let me know.

I think Paul Rubin and Paul Rudin are right, if they really are 2
different
people, an array is a better solution if you're dealing with integers.
They
both mention numpy, which I know nothing about, or the array module.

kj, what *are* you going to do with this list/array?
As others have pointed out, there are differences between lists,
arrays, and
dictionaries.

The problem I was solving was this: I wanted an array of 32-bit
integers to
be used as a bit array, and I wanted it initialized with all bits set,
that
is, each member of the array had to be set to 4294967295. Of course,
you
could set your initializer to 0, or any other 32-bit number.

Originally I found that the doubling method I wrote about before was a
LOT
faster than appending the elements one at a time, and tonight I tried
the
"list = initializer * N" method. Running the code below, the doubling
method is still fastest, at least on my system.

Of course, as long as you avoid the 'one at a time' method, we're
talking
about fractions of a second, even for arrays that I think are huge,
like
the 536,870,912 byte beastie below.

[code]

# Written in Python 3.x

import array
import time

#* * * * * * * * * * * * * * * * * * * * * *
# Doubling method, run time = 0.413938045502

t0 = time.time()

newArray = array.array('I')                   # 32-bit unsigned
integers

newArray.append(4294967295)

for i in range(27):                        # 2**27 integers, 2**29
bytes
    newArray.extend(newArray)

print(time.time() - t0)

print(newArray[134217727])          # confirm array is fully
initialized

#* * * * * * * * * * * * * * * * * * * * * *
# One at a time, run time = 28.5479729176

t0 = time.time()

newArray2 = array.array('I')

for i in range(134217728):                        # the same size as
above
    newArray2.append(4294967295)

print(time.time() - t0)

print(newArray2[134217727])                      # confirm array

#* * * * * * * * * * * * * * * * * * * * * *
# List with "*", run time = 1.06160402298

t0 = time.time()

newList = [4294967295] * 134217728

print(time.time() - t0)

print(newList[134217727])                      # confirm list

[/code]

If, instead of 134,217,728 integers, I want something different, like
100,000,000, the method I use is:

[code]

#* * * * * * * * * * * * * * * * * * * * * *
# Not a power of 2, run time = 0.752086162567

t0 = time.time()

newArray = array.array('I')

tempArray = array.array('I')

tempArray.append(4294967295)

size = 100000000

while size:                                          # chew through
'size' until it's gone
    if (size & 1):                                    # if this bit of
'size' is 1
        newArray.extend(tempArray)    # add a copy of the temp array
    size >>= 1                                     # chew off one bit
    tempArray.extend(tempArray)       # double the size of the temp
array

print(time.time() - t0)

print(newArray[99999999])

#* * * * * * * * * * * * * * * * * * * * * *
# # Not a power of 2, list with "*", run time = 1.19271993637

t0 = time.time()

newList = [4294967295] * 100000000

print(time.time() - t0)

print(newList[99999999])

[/code]

I think it is interesting that the shorter list takes longer than the
one
that is a power of 2 in length. I think this may say that the "list =
initializer * N" method uses something similar to the doubling method.

Also, tempArray (above) gets reallocated frequently, and demonstrates
that reallocation is not a big problem.

Finally, I just looked into calling C functions, and found
PyMem_Malloc,
PyMem_Realloc, PyMem_Free, etc. in the Memory Management section of
the
Python/C API Reference Manual. This gives you uninitialized memory,
and
should be really fast, but it's 6:45 AM here, and I don't have the
energy
to try it.

Gil



More information about the Python-list mailing list