[Tutor] memory consumption
Andre' Walker-Loud
walksloud at gmail.com
Wed Jul 3 21:50:38 CEST 2013
Hi Steven, Dave, Allen and All,
OK - forgive my poor terminology. That is also something I am learning.
The three of you asked very similar questions - so I will just reply this once.
>> I wrote some code that is running out of memory.
>
> How do you know? What are the symptoms? Do you get an exception? Computer crashes? Something else?
I am using mac OS, and have the "Activity Monitor" open, and can watch the memory slowly get consumed. Then, a crash when it runs out, about 2/3 of the way through
Python(50827,0xa051c540) malloc: *** mmap(size=16777216) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Bus error
I have 8GB of RAM, and the underlying data file I start with is small (see below).
>> It involves a set of three nested loops, manipulating a data file (array) of dimension ~ 300 x 256 x 1 x 2.
>
> Is it a data file, or an array? They're different things.
OK. I begin with a data file (hdf5 to be specific). I load a piece of the data file into memory as a numpy array. The size of the array is
300 x 256 x 1 x 2
and each element is a double precision float, so the total array is ~1.3MB
My code is not in the boiled-down-to-simplest-failing-state, which is why I attempted to just describe. But I will get more specific below.
>> ################################################
>> # generic code skeleton
>> # import a class I wrote to utilize the 3rd party software
>> import my_class
>
> Looking at the context here, "my_class" is a misleading name, since it's actually a module, not a class.
Yes. In "my_class" I wrote a class. I think the essentials are
##########################
# my_class # perhaps a poor name, but I will stick with it for the re-post
'''
The third_party software performs numerical minimization. It has a function that constructs the function to be minimized, and then a routing to actually do the minimization. It minimizes a chi^2 with respect to "other_vars"
'''
import third_party
class do_stuff:
# I am aware this doesn't follow the class naming convention, just sticking with my previous post name
def __call__(data,other_vars):
self.fit = third_party.function_set_up(data,other_vars)
def minimize(self):
try:
self.fit.minimize()
self.have_fit = True
except third_party.Error:
self.have_fit = False
##########################
I should say the above construction works as I expect when I do a single fit.
> # instantiate the function do_stuff
>> my_func = my_class.do_stuff()
[snip]
>> # I am manipulating a data array of size ~ 300 x 256 x 1 x 2
>> data = my_data # my_data is imported just once and has the size above
>
> Where, and how, is my_data imported from? What is it? You say it is "a data array" (what sort of data array?) of size 300x256x1x2 -- that's a four-dimensional array, with 153600 entries. What sort of entries? Is that 153600 bytes (about 150K) or 153600 x 64-bit floats (about 1.3 MB)? Or 153600 data structures, each one holding 1MB of data (about 153 GB)?
see above - each element is a double precision (64) float, so 1.3 MB.
>> # instantiate a 3d array of size 20 x 10 x 10 and fill it with all zeros
>> my_array = numpy.zeros([20,10,10])
>
> At last, we finally see something concrete! A numpy array. Is this the same sort of array used above?
>
>
>> # loop over parameters and fill array with desired output
>> for i in range(loop_1):
>> for j in range(loop_2):
>> for k in range(loop_3):
>
> How big are loop_1, loop_2, loop_3?
The sizes of the loops are not big
len(loop_1) = 20
len(loop_2) = 10
len(loop_3) = 10
> You should consider using xrange() rather than range(). If the number is very large, xrange will be more memory efficient.
>
>
>> # create tmp_data that has a shape which is the same as data except the first dimension can range from 1 - 1024 instead of being fixed at 300
>> ''' Is the next line where I am causing memory problems? '''
>> tmp_data = my_class.chop_data(data,i,j,k)
>
> How can we possibly tell if chop_data is causing memory problems when you don't show us what chop_data does?
OK, below is chop data
###################
import numpy as np
def chop_data(data,N,M):
n_orig = data.shape[0] # data is a numpy array, data.shape returns the size of each dimension
data_new = []
for b in range(N):
# draw M random samples with repeat from original data and take the mean
# np.random.randint(0,n_orig,M) creates a list ranging from 0 to n_orig-1 of size M
data_new.append(data[np.random.randint(0,n_orig,M)].mean(axis=0))
data_new = np.array(data_new) # convert to a numpy array
return data_new
###################
To be clear, I will re-write the loop I had in the original post
###################
import my_class
import numpy
my_func = my_class.do_stuff()
other_vars = np.array([''' list of guess initial values for numerical minimization '''])
loop_lst = np.array([1,2,4,8,16,32,64,128,256,512,1024])
n_sample = 20
my_array = numpy.zeros([n_sample,len(loop_lst),len(loop_lst)])
for s in range(n_sample):
for n_i,n in enumerate(loop_lst):
for m_i,m in enumerate(loop_lst):
tmp_data = my_class.chop_data(data,n,m)
my_func(tmp_data,other_vars)
my_func.minimize()
my_array[s,n_i,m_i] = my_func.fit.results() # returns a 64-float, results is a built in third_party function
This all runs as expected, except each pass through the loops, the code consumes more and more memory. Two-thirds through the outer loop, it crashes with the above mentioned memory error.
Hopefully, this is more clear.
Andre
More information about the Tutor
mailing list