[Tutor] memory consumption

Oscar Benjamin oscar.j.benjamin at gmail.com
Thu Jul 4 01:24:18 CEST 2013


On 3 July 2013 23:37, Andre' Walker-Loud <walksloud at gmail.com> wrote:
> Hi Oscar,

Hi Andre',

(your name shows in my email client with an apostrophe ' after it; I'm
not sure if I'm supposed to include that when I write it).

>> The error is for creating an mmap object. This is not something that
>> numpy does unless you tell it to.
>
> I have not explicitly told it to.
>
>> So you're using pytables or h5py, or something else? It really would
>> help if you would specify this instead of trying to be generic.
>
> All the third party software I am using:
> numpy
> pytables
> hdf5

What do you mean above by hdf5? Is that a software library or just the
format? I'll assume you're using that format and that you're using the
pytables library to access it.

> pyminuit [python interface to Minuit]
> Minuit  [Minuit - a c++ code based developed in CERN for numerical minimization]

Okay, this is much better.

> I was hoping I was must making some obvious mistake, assuming people are mostly not familiar with much of any of the specific code I am using.  But I guess it is either not so obvious, or my explanations are too poor.

I wouldn't be surprised if pytables had a memory leak in it. Although
I'd be less surprised if you were using it incorrectly and that
resulted in a memory leak.

>> My guess is that the hdf5 library loads the array as an mmap'ed memory
>> block and you're not actually working with an ordinary numpy array
>> (even if it has a similar interface).
>
> I specifically load the data once, as
>
> my_file = tables.openFile(my_data_file.h5)
> my_data = my_file.getNode(path_to_data).read()
>
> after this, "my_data" seems to have all the features of a numpy array.  for example,
>
>> Have you checked the actual memory size of the array? If it's a real
>> numpy array you can use the nbytes attribute:
>>>>> a = numpy.zeros([300, 256, 1, 2], float)
>>>>> a.nbytes
>> 1228800
>
> this works on my_data.

Okay, so what happens if you don't import pytables, don't load the
my_data_file.h5 file and just say

    data = numpy.zeros([300, 256, 1, 2], float)

and run the rest of your code? Do you still get a memory leak?

>>> class do_stuff:
>>> # I am aware this doesn't follow the class naming convention, just sticking with my previous post name
>>>    def __call__(data,other_vars):
>>>        self.fit = third_party.function_set_up(data,other_vars)
>>>
>>>    def minimize(self):
>>>        try:
>>>            self.fit.minimize()
>>>            self.have_fit = True
>>>        except third_party.Error:
>>>            self.have_fit = False
>>> ##########################
>>
>> If you write code like the above then you really cannot expect other
>> people to just understand what you mean if you don't show them the
>> code. Specifically the use of __call__ is confusing. Really, though,
>> this class is just a distraction from your problem and should have
>> been simplified away.
>
> Off my main topic, but could you explain more?
> I am also not very experienced writing classes, and was learning from an example.  So I am not sure why __call__ is confusing.  I thought that was correct.

You're using __call__ correctly. What I mean is that while it may (or
may not) be appropriate for you to use __call__ in your program it
seems inappropriate in the simplified example. I expect in a
genericised example that `my_func` is a real function. I don't expect
it to have a minimize method and I expect it to be imported rather
than created. To quote from Dave and Alan earlier in the thread:

On 3 July 2013 19:55, Dave Angel <davea at davea.name> wrote:
> On 07/03/2013 02:17 PM, Andre' Walker-Loud wrote:
>>
>> # instantiate the function do_stuff
>> my_func = my_class.do_stuff()
>
> So this is a class-static method which returns a callable object?  One with
> methods of its own?

Also:
On 3 July 2013 19:57, Alan Gauld <alan.gauld at btinternet.com> wrote:
> On 03/07/13 19:17, Andre' Walker-Loud wrote:
>
>> # instantiate the function do_stuff
>> my_func = my_class.do_stuff()
>
> You don;t instantiate functions you call them. You are setting my_func to be
> the return value of do_stuff(). What is that return value? What does my_func
> actually refer to?


>>>            tmp_data = my_class.chop_data(data,n,m)
>>
>> Where did data come from? Is that the mmap'ed array from the hdf5 library?
>
> as above, data is loaded once with pytables, using the .read() function.
> Subsequently, with my "chop_data" function, I believe that returns a numpy array, and I have not explicitly asked it anything about "mmap" so I am not sure.  How would you check?

If you haven't asked for an mmap then you won't get one from any numpy
operations. However pytables creates mmap'ed arrays that behave like
numpy arrays so I believe that's where the problem comes in. What
happens if you do something like:

>>> print(type(data))
<type 'numpy.ndarray'>

>>>            my_func(tmp_data,other_vars)
>>>            my_func.minimize()
>>
>> I now know that the above two lines call thirdparty.function_setup()
>> and my_func.fit.minimize(). I still have no idea what they do though.
>
> I was trying to avoid that, since I suspect(ed) the problem is with me and not the third party.
> Without going into specifics, the first function constructs a chi^2 function which looks like
>
> chisq = sum_i ( (data[i] - fit_func(i,fit_params)) / data_error[i] )**2
>
> the second function works to numerically minimize chisq with respect to the "fit_params" which are a 1d array.

What happens if you just comment those lines out? Or if you replace
them with dummy lines that don't do anything interesting? This is how
you'll narrow down the problem.

> If you can help me understand the issue of mmap (whether somehow I am creating this unwittingly), that would be great.  ie, what tests can I perform to check?

> Otherwise, it seems perhaps the best thing for me to do now is take eryksun's advice and learn how to use a memory profiler.

I'm not so sure if you'll want to go down that path just yet. I
personally expect that I could find the cause of your problem if I was
running your code on my computer and I've never used a memory profiler
before (alternatively that could mean that I'm not qualified to advise
on this point though).

Based on the described list of software and your statement that you're
not explicitly creating mmaps yourself my suspicion points at pytables
(or your usage of pytables). Try using a dummy array and eliminating
pytables from your program. If pytables is a necessary part of the
memory leak problem then you should probably go to the pytables-users
mailing list for more help. If the issue is unrelated to pytables then
you're a few steps on your way to simplifying it into a simple but
complete example that you can post here.

Really though I'm guessing as I still can't see a complete example
that demonstrates the problem. Creating a Short, Self-Contained,
Correct Example (http://sscce.org/) is a useful thing to do; the
process will more often than not enable you to discover the root cause
of your problem or at least to fix the problem. If it does not then it
gives you something to post to a mailing list that someone else can
easily analyse and then tell you what the problem is.


Oscar


More information about the Tutor mailing list