[Tutor] memory consumption

Dave Angel davea at davea.name
Wed Jul 3 20:55:33 CEST 2013


On 07/03/2013 02:17 PM, Andre' Walker-Loud wrote:
> Hi All,
>
> I wrote some code that is running out of memory.

And you know this how?  What OS are you using, and specifically how is 
it telling you that you've run out of memory?  And while you're at it, 
what version of Python?  And are the OS and Python 32 and 32 bit, or 64 
and 64, or mixed?


>  It involves a set of three nested loops, manipulating a data file (array) of dimension ~ 300 x 256 x 1 x 2.  It uses some third party software, but my guess is I am just not aware of how to use proper memory management and it is not the 3rd party software that is the culprit.

In particular you're using numpy, and I don't know its quirks.  So I'll 
just speak of Python in general, and let someone else address numpy.

>
> Memory management is new to me, and so I am looking for some general guidance.  I had assumed that reusing a variable name in a loop would automatically flush the memory by just overwriting it.

It could be useful to learn how Python memory is manipulated.  To start 
with, the 'variable' doesn't take a noticeable amount of space.  It's 
the object its bound to that might take up lots of space, directly or 
indirectly.  When you bind a new object to it, you free up the last one, 
unless something else is also bound to it.

By indirectly, I refer to something like a list, which is one object, 
but which generally is bound to dozens or millions of others, and any of 
those may be bound to lots of others.  Unbinding the list will usually 
free up all that stuff.

The other thing that can happen is an object may indirectly be bound to 
itself.  Trivial example:

 >>> mylist = [1,2]
 >>> mylist.append(mylist)
 >>> mylist
[1, 2, [...]]
 >>>

Fortunately for us, the repr() display of mylist doesn't descend 
infinitely into the guts of the elements, or it would be still printing 
next week (or until the printing logic ran out of memory).

Anyway, once you have such a binding loop, the simple memory freeing 
logic (refcount) has to defer to the slower and less frequently run gc 
(garbage collection).


>  But this is probably wrong.  Below is a very generic version of what I am doing.  I hope there is something obvious I am doing wrong or not doing which I can to dump the memory in each cycle of the innermost loop.  Hopefully, what I have below is meaningful enough, but again, I am new to this, so we shall see.
>
> ################################################
> # generic code skeleton
> # import a class I wrote to utilize the 3rd party software
> import my_class
>
> # instantiate the function do_stuff
> my_func = my_class.do_stuff()

So this is a class-static method which returns a callable object?  One 
with methods of its own?

>
> # I am manipulating a data array of size ~ 300 x 256 x 1 x 2
> data = my_data  # my_data is imported just once and has the size above
>
> # instantiate a 3d array of size 20 x 10 x 10 and fill it with all zeros
> my_array = numpy.zeros([20,10,10])
> # loop over parameters and fill array with desired output
> for i in range(loop_1):
>      for j in range(loop_2):
>          for k in range(loop_3):
>              # create tmp_data that has a shape which is the same as data except the first dimension can range from 1 - 1024 instead of being fixed at 300
>
>              '''  Is the next line where I am causing memory problems? '''

Hard to tell.  is chop-data() a trivial function you could have posted? 
  It's a class method, not an instance method.  Is it keeping references 
to the data it's returning?  Perhaps for caching purposes?

>              tmp_data = my_class.chop_data(data,i,j,k)
>              my_func(tmp_data)
>              my_func.third_party_function()
>              my_array([i,j,k]) = my_func.results() # this is just a floating point number
>
>              ''' should I do something to flush tmp_data? '''

You don't show us any code that would cause me to suspect tmp_data.

> #############################################
>

You leave out so much that it's hard to know what parts to ask you to 
post.  if data is a numpy array, and my_class.chop_data is a class 
method, perhaps you could post that class method.

Do you have a tool for your OS that lets you examine memory usage 
dynamically?  If you do, sometimes it's instructive to watch while a 
program is running to see what the dynamics are.

Note that Python, like nearly any other program written with the C 
library, will not necessarily free memory all the way to the OS at any 
particular moment in time.  If you (A C programmer) were to malloc() a 
megabyte block and immediately free it, you might not see the free 
externally, but new allocations would instead be carved out of that 
freed block.

Those specifics vary with OS and with C compiler.  And it may very well 
vary with size of block.  Thus individual blocks over a certain size may 
be allocated directly from the OS, and freed immediately when done, 
while smaller blocks are coalesced in the library and reused over and over.

-- 
DaveA


More information about the Tutor mailing list