[Tutor] Strengths & weaknesses of Python lists compared to "old school" arrays [Was "Fixed Vector Array"]

Steven D'Aprano steve at pearwood.info
Thu Mar 5 13:20:50 CET 2015


On Wed, Mar 04, 2015 at 01:10:11PM -0600, boB Stepp wrote:

> What are the strengths and weaknesses of Python lists compared to "old
> school" arrays?

Python lists are objects, containing a sequence of objects. Objects are 
"fatter" than low-level data like machine ints. (They take up more 
space.)

Python lists are over-allocated. E.g. when you create an empty list, 
Python allocates enough space for (I'm guessing) eight items. When it 
fills up, and you add another item, the list is *quadrupled* to allow 
enough space for 32 items. When those items are all full, and you add 
another item, it quadruples in size again. At some point it stops 
quadrupling and merely doubles in size, but the over-all result is that 
*on average* Python lists are about 50% bigger than actually needed.

Why do they do that? Because that saves time. Python lists trade off a 
bit of extra space for extra speed. Resizing the list is expensive, and 
it can be mathematically proven that by doubling in size like this, 
appending to a list works out to be approximately constant time on 
average, regardless of whether the list is nearly empty or full. The 
technical term used is "constant time amortized".

Python lists can hold any arbitrary object. Arrays (from the `array` 
module) can only hold low-level types like C integers or floats. That 
saves memory, but is actually slower.


>     1) If the data involved is strictly numerical, does this alter
> your answer? Especially if multidimensional matrix calculations are
> involved?

Absolutely. In that case, use numpy, which has powerful numeric 
functions operating at high speed on low-level C-like arrays.

py> import numpy
py> L = [1] * 5000000
py> A = numpy.array(L)
py> with Stopwatch():
...     sum(L)
...
5000000
time taken: 0.098561 seconds
py> with Stopwatch():
...     numpy.sum(A)
...
5000000
time taken: 0.008987 seconds



>     2) If either numerical or string data is allowed to be stored,
> does this change your answer?

If you have mixed data, use lists.


>     3) Have "old school" arrays evolved into something more flexible
> in most programming languages?

I don't know about "most" languages, but generally high-level languages 
will tend to have something similar to Python lists, while low-level 
languages tend to have things closer to C arrays.


>     4) Are there other clarifying questions I should be asking on this
> topic that are not occurring to me?

Probably :-)


> > If Array returns a fixed size array can't you just always assign to the
> > index position. In other words does the array need to be filled
> > in a sequential manner or could you have a 'hole' in the middle (one of the
> > few things an array allows that a list doesn't - although you can just use a
> > dictionary if that's really important!

I'm not really sure that arrays can have holes in them. Basic low-level 
arrays, like found in Pascal, C or Fortran, cannot. There are ways and 
means of creating so-called "sparse arrays", but they typically don't 
allow you to just have arbitrary gaps in the array, rather they have 
some sort of symmetry or pattern, e.g. a two dimensional array like:

[ a b c d e f
  0 g h i j k
  0 0 l m n o
  0 0 0 p q r
  0 0 0 0 s t
  0 0 0 0 0 u ]

with all zeroes below the diagonal may be able to be programmed in a 
more efficient way. 


> What is a complete list of the "things" array-style thinking allow,
> but that Python lists don't? And for those "things", what would be a
> Python way of handling them?

The biggest -- probably only -- advantage is that with low-level arrays, 
you can use high-speed C functions that effectively operate on the whole 
array at once (for some definition of "at once") instead of relatively 
slow item-at-a-time processing like in Python. For ten items, or ten 
thousand, the difference is minimal, but for serious number-crunching 
will millions or hundreds of millions of values, numpy will out-perform 
anything you can write in pure Python.



-- 
Steve


More information about the Tutor mailing list