[Tutor] memory consumption
Steven D'Aprano
steve at pearwood.info
Wed Jul 3 20:55:40 CEST 2013
On 04/07/13 04:17, Andre' Walker-Loud wrote:
> Hi All,
>
> I wrote some code that is running out of memory.
How do you know? What are the symptoms? Do you get an exception? Computer crashes? Something else?
> It involves a set of three nested loops, manipulating a data file (array) of dimension ~ 300 x 256 x 1 x 2.
Is it a data file, or an array? They're different things.
> It uses some third party software, but my guess is I am just not aware of how to use proper memory management and it is not the 3rd party software that is the culprit.
As a general rule, you shouldn't need to worry about such things, at least 99% of the time.
> Memory management is new to me, and so I am looking for some general guidance. I had assumed that reusing a variable name in a loop would automatically flush the memory by just overwriting it. But this is probably wrong. Below is a very generic version of what I am doing. I hope there is something obvious I am doing wrong or not doing which I can to dump the memory in each cycle of the innermost loop. Hopefully, what I have below is meaningful enough, but again, I am new to this, so we shall see.
Completely non-meaningful.
> ################################################
> # generic code skeleton
> # import a class I wrote to utilize the 3rd party software
> import my_class
Looking at the context here, "my_class" is a misleading name, since it's actually a module, not a class.
> # instantiate the function do_stuff
> my_func = my_class.do_stuff()
This is getting confusing. Either you've oversimplified your pseudo-code, or you're using words in ways that do not agree with standard terminology. Or both. You don't instantiate functions, you instantiate a class, which gives you an instance (an object), not a function.
So I'm lost here -- I have no idea what my_class is (possibly a module?), or do_stuff (possibly a class?) or my_func (possibly an instance?).
> # I am manipulating a data array of size ~ 300 x 256 x 1 x 2
> data = my_data # my_data is imported just once and has the size above
Where, and how, is my_data imported from? What is it? You say it is "a data array" (what sort of data array?) of size 300x256x1x2 -- that's a four-dimensional array, with 153600 entries. What sort of entries? Is that 153600 bytes (about 150K) or 153600 x 64-bit floats (about 1.3 MB)? Or 153600 data structures, each one holding 1MB of data (about 153 GB)?
> # instantiate a 3d array of size 20 x 10 x 10 and fill it with all zeros
> my_array = numpy.zeros([20,10,10])
At last, we finally see something concrete! A numpy array. Is this the same sort of array used above?
> # loop over parameters and fill array with desired output
> for i in range(loop_1):
> for j in range(loop_2):
> for k in range(loop_3):
How big are loop_1, loop_2, loop_3?
You should consider using xrange() rather than range(). If the number is very large, xrange will be more memory efficient.
> # create tmp_data that has a shape which is the same as data except the first dimension can range from 1 - 1024 instead of being fixed at 300
> ''' Is the next line where I am causing memory problems? '''
> tmp_data = my_class.chop_data(data,i,j,k)
How can we possibly tell if chop_data is causing memory problems when you don't show us what chop_data does?
> my_func(tmp_data)
> my_func.third_party_function()
Again, no idea what they do.
> my_array([i,j,k]) = my_func.results() # this is just a floating point number
> ''' should I do something to flush tmp_data? '''
No. Python will automatically garbage collect is as needed.
Well, that's not quite true. It depends on what my_tmp actually is. So, *probably* no. But without seeing the code for my_tmp, I cannot be sure.
--
Steven
More information about the Tutor
mailing list