[Tutor] Memory error - how to manage large data sets?

Alan Gauld alan.gauld at btinternet.com
Tue Jul 29 01:09:30 CEST 2008

"Karthik" <rs.karthick at gmail.com> wrote in message 
news:488dfe6a.04506e0a.40cd.23a1 at mx.google.com...
> Forgot to include the following information,
> Platform - win32
> Version - 2.5.1
> Error message:
> Traceback (most recent call last):
>  File "C:\Python25\programs\fibo.py", line 10, in <module>
>    if i % 2 == 0:
> MemoryError

OK, It does look like you bust your memory allocation.

> Code:
> fib = []
> even = []

You don't really need two lists.

> def fibonacci(x,y):
>    return x+y

And this isn't really the fibonacci function its an addition
which you don;t need(see below)

> for i in range (0,1000000):

This creates a 3rd list of 1 million numbers, so thats
3 million * the storage of one number. Still not enough
to trouble most PCs nowadays...

>    if i < 2:
>        fib.append(i)
>    else:
>        i = fib[i-1] + fib[i-2]
>        if i % 2 == 0:
>            fib.append(i)

You want to append to fib on every number not just on evens?
So just take this outside the if.

>            even.append(i)

You don't need this because we can extract the even numbers
when we do the summation later. All its doing it gobbling up space.

>        else:
>            fib.append(i)

dispense with the else and just put the fib append outside.
ie your code becomes

for i in xrange (0,1000000):  # xranmge should avoid the extra list
    if i >= 2:
        i = fib[i-1] + fib[i-2]

> total = reduce(fibonacci,even)

You could use the sum() function in Python 2.5 but for
your purpose I'd propose using the fib list and using slicing
to select every second element:

>>> L = range(10)
>>> L[::2]
[0, 2, 4, 6, 8]

So you could use

total = sum(fib[::2])

and dispense with evens entirely

> Any pointers would be of great help to me.

Frankly I'm puzzled why its using so much RAM.
I'd expect resource usage to be around 20-30MB
tops. hmmm, OK a bit of experimentation shows
that the fibonacci series generates very big numbers
very quickly so each entry is a long number - a very
long number!

21000 characters for the 100,000th entry!

No wonder it blows up.

You have two choices - use floats, which might still
not be big enough! ... It isn't,  98500 out of 100000 entries
were infinite using floats! So you need to calculate the
total as you go without saving the values

Or buy a much, much bigger computer! :-)

PS I did all of that investigation using the >>> prompt.
Its a very useful tool for trying things out, I just generated
the list (using 100,000 entries) and examined the contents
using various list comprehensions and len() etc


Alan Gauld
Author of the Learn to Program web site

More information about the Tutor mailing list