[Numpy-discussion] Memory Usage Question

Tom Kuiper kuiper at jpl.nasa.gov
Tue Jun 8 12:22:41 EDT 2010


Dear Eric,

thank you for the insight and suggestion.  Reading between the lines I 
developed the suspicion that the problem might be in the extension 
function 'unpack_vdr_data'.  Previously the last part of that was

      Cmplx = PyComplex_FromCComplex(cmplx);
      if (PyList_SetItem(Clist, time_point_count, Cmplx)) {
        fprintf(stderr,"Could not set list item %d\n",time_point_count);
      }
      time_point_count++;
    }
  }
  return Py_BuildValue("O", Clist);
}

Your response made me realize that Clist was being retained as a Python 
object each time.  The following works much better,

  PyObject * result;
  ....
      Cmplx = PyComplex_FromCComplex(cmplx);
      if (diag) {
        printf("Created Python complex object %d\n", time_point_count);
      }
      time_point_count++;
    }
  }
  result = Py_BuildValue("O", Clist);
  Py_CLEAR(Cmplx);
  Py_CLEAR(Clist);
  return result;
}

Memory still grows, but much more slowly and, for my purposes, it's no 
longer a problem.

I will use the report_memory function to research this a little more.

Thanks so much,

Tom
> Date: Sun, 06 Jun 2010 15:00:16 -1000
> From: Eric Firing <efiring at hawaii.edu>
> Subject: Re: [Numpy-discussion] memory usage question
> To: numpy-discussion at scipy.org
>
> On 06/06/2010 02:17 PM, Tom Kuiper wrote:
>   
>> Greetings all.
>>
>> I have a feeling that, coming at this with a background in FORTRAN and
>> C, I'm missing some subtlety, possibly of an OO nature.   Basically, I'm
>> looping over very large data arrays and memory usage just keeps growing
>> even though I re-use the arrays.  Below is a stripped down version of
>> what I'm doing.  You'll recognize it as gulping a great quantity of data
>> (1 million complex samples), Fourier transforming these by 1000 sample
>> blocks into spectra, co-adding the spectra, and doing this 255 times,
>> for a grand 1000 point total spectrum.  At iteration 108 of the outer
>> loop, I get a memory error.  By then, according to 'top', ipython (or
>> python) is using around 85% of 3.5 GB of memory.
>>
>>      P = zeros(fft_size)
>>    nsecs = 255
>>    fft_size = 1000
>>    for i in range(nsecs):
>>      header,data = get_raw_record(fd_in)
>>      num_bytes = len(data)
>>      label, reclen, recver, softver, spcid, vsrid, schanid,
>> bits_per_sample, \
>>          ksamps_per_sec, sdplr, prdx_dss_id, prdx_sc_id, prdx_pass_num, \
>>          prdx_uplink_band,prdx_downlink_band, trk_mode, uplink_dss_id,
>> ddc_lo, \
>>          rf_to_if_lo, data_error, year, doy, sec, data_time_offset, frov,
>> fro, \
>>          frr, sfro,rf_freq, schan_accum_phase, (scpp0,scpp1,scpp2,scpp3), \
>>          schan_label = header
>>      # ksamp_per_sec = 1e3, number of complex samples in 'data' = 1e6
>>      num_32bit_words = len(data)*8/BITS_PER_32BIT_WORD
>>      cmplx_samp_per_word = (BITS_PER_32BIT_WORD/(2*bits_per_sample))
>>      cmplx_samples =
>> unpack_vdr_data(num_32bit_words,cmplx_samp_per_word,data)
>>      del(data) # This makes no difference
>>      for j in range(0,ksamps_per_sec*1000/fft_size):
>>        index = int(j*fft_size)
>>        S = fft(cmplx_samples[index:index+fft_size])
>>        P += S*conjugate(S)
>>      del(cmplx_samples) # This makes no difference
>>    if (i % 20) == 0:
>>      gc.collect(0) # This makes no difference
>>    P /= nsecs
>>    sample_period = 1./ksamps_per_sec # kHz
>>    f = fftfreq(fft_size, d=sample_period)
>>
>> What am I missing?
>>     
>
> I don't know, but I would suggest that you strip the example down
> further: instead of reading data from a file, use numpy.random.randn to
> generate fake data as needed.  In other words, use only numpy
> functions--no readers, no unpackers.  Put this minimal script into a
> file and run it from the command line, not in ipython.  (Have you
> verified that you get the same result running a standalone script from
> the command line as running from ipython?)  Put a memory-monitoring step
> inside, maybe at each outer loop iteration.  You can use the
> matplotlib.cbook.report_memory function or similar:
>
> def report_memory(i=0):  # argument may go away
>      'return the memory consumed by process'
>      from subprocess import Popen, PIPE
>      pid = os.getpid()
>      if sys.platform=='sunos5':
>          a2 = Popen('ps -p %d -o osz' % pid, shell=True,
>              stdout=PIPE).stdout.readlines()
>          mem = int(a2[-1].strip())
>      elif sys.platform.startswith('linux'):
>          a2 = Popen('ps -p %d -o rss,sz' % pid, shell=True,
>              stdout=PIPE).stdout.readlines()
>          mem = int(a2[1].split()[1])
>      elif sys.platform.startswith('darwin'):
>          a2 = Popen('ps -p %d -o rss,vsz' % pid, shell=True,
>              stdout=PIPE).stdout.readlines()
>          mem = int(a2[1].split()[0])
>
>      return mem
>
> I'm suspecting the problem may be in your data reader and/or unpacker,
> not in the application of numpy functions.  Also, ipython can confuse
> the issue by keeping references to objects.  In any case, with a simpler
> test script and regular memory monitoring, it should be easier for you
> to track down the problem.
>
> Eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100608/71c94295/attachment.html>


More information about the NumPy-Discussion mailing list