[Tutor] memory consumption

Oscar Benjamin oscar.j.benjamin at gmail.com
Wed Jul 3 21:57:29 CEST 2013


On 3 July 2013 19:17, Andre' Walker-Loud <walksloud at gmail.com> wrote:
> Hi All,
>
> I wrote some code that is running out of memory.  It involves a set of three nested loops, manipulating a data file (array) of dimension ~ 300 x 256 x 1 x 2.  It uses some third party software, but my guess is I am just not aware of how to use proper memory management and it is not the 3rd party software that is the culprit.

You mention third-party software but you need to say which software. I
use numpy extensively so if that's what you mean I can help.

> Memory management is new to me, and so I am looking for some general guidance.  I had assumed that reusing a variable name in a loop would automatically flush the memory by just overwriting it. But this is probably wrong.

It's not wrong unless you happen to be storing a reference to the
variable somewhere else.

> Below is a very generic version of what I am doing. I hope there is something obvious I am doing wrong or not doing which I can to dump the memory in each cycle of the innermost loop.  Hopefully, what I have below is meaningful enough, but again, I am new to this, so we shall see.

This is too generic and also incomplete. Ideally you should try to
post an example that someone else can actually run. The best way is to
progressively simplify your program while still verifying that the
confusing/incorrect behaviour occurs. You can find some handy tips
here: http://sscce.org/

>
> ################################################
> # generic code skeleton
> # import a class I wrote to utilize the 3rd party software
> import my_class
>
> # instantiate the function do_stuff
> my_func = my_class.do_stuff()
>
> # I am manipulating a data array of size ~ 300 x 256 x 1 x 2
> data = my_data  # my_data is imported just once and has the size above
>
> # instantiate a 3d array of size 20 x 10 x 10 and fill it with all zeros
> my_array = numpy.zeros([20,10,10])
> # loop over parameters and fill array with desired output
> for i in range(loop_1):
>     for j in range(loop_2):
>         for k in range(loop_3):

If the above is supposed to just say e.g. 'for i in range(20)' then
this is not a good simplification. By the way numpy has a better
solution for this kind of nested loop problem:

>>> import numpy
>>> a = numpy.zeros([2, 3, 4], float)  # Always specify the type of the array
>>> a
array([[[ 0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.]],

       [[ 0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.]]])
>>> a.shape
(2, 3, 4)
>>> for i, j, k in numpy.ndindex(*a.shape):
...     print(i, j, k)
...
0 0 0
0 0 1
0 0 2
0 0 3
0 1 0
0 1 1
0 1 2
... and so on ...

>             # create tmp_data that has a shape which is the same as data except the first dimension can range from 1 - 1024 instead of being fixed at 300
>
>             '''  Is the next line where I am causing memory problems? '''
>             tmp_data = my_class.chop_data(data,i,j,k)

I can't say whether any of the lines in the loop are or are not
causing memory problems as I don't know what any of them does.

>             my_func(tmp_data)

Given that you're not using the return from my_func I assume that it
changes the value of something somewhere else. Are you storing a
reference to tmp_data somewhere in that function?

>             my_func.third_party_function()

When making your code generic you should try to use a function name
that gives some indication of what it does in so far as it is
relevant; if it's not relevant then just remove it from your example.

>             my_array([i,j,k]) = my_func.results() # this is just a floating point number

This would be the only line that I could definitely understand.
Unfortunately though it's a syntax error e.g.:

>>> f([1,2,3]) = g()
  File "<stdin>", line 1
SyntaxError: can't assign to function call

The correct way to assign to an element of a numpy array (assuming
that's what you meant) is

    my_array[i, j, k] = my_func.results()


Oscar


More information about the Tutor mailing list