[Numpy-discussion] memory usage question

Tom Kuiper kuiper at jpl.nasa.gov
Sun Jun 6 20:17:47 EDT 2010


Greetings all.

I have a feeling that, coming at this with a background in FORTRAN and 
C, I'm missing some subtlety, possibly of an OO nature.   Basically, I'm 
looping over very large data arrays and memory usage just keeps growing 
even though I re-use the arrays.  Below is a stripped down version of 
what I'm doing.  You'll recognize it as gulping a great quantity of data 
(1 million complex samples), Fourier transforming these by 1000 sample 
blocks into spectra, co-adding the spectra, and doing this 255 times, 
for a grand 1000 point total spectrum.  At iteration 108 of the outer 
loop, I get a memory error.  By then, according to 'top', ipython (or 
python) is using around 85% of 3.5 GB of memory.

    P = zeros(fft_size)
  nsecs = 255
  fft_size = 1000
  for i in range(nsecs):
    header,data = get_raw_record(fd_in)
    num_bytes = len(data)
    label, reclen, recver, softver, spcid, vsrid, schanid, 
bits_per_sample, \
        ksamps_per_sec, sdplr, prdx_dss_id, prdx_sc_id, prdx_pass_num, \
        prdx_uplink_band,prdx_downlink_band, trk_mode, uplink_dss_id, 
ddc_lo, \
        rf_to_if_lo, data_error, year, doy, sec, data_time_offset, frov, 
fro, \
        frr, sfro,rf_freq, schan_accum_phase, (scpp0,scpp1,scpp2,scpp3), \
        schan_label = header
    # ksamp_per_sec = 1e3, number of complex samples in 'data' = 1e6
    num_32bit_words = len(data)*8/BITS_PER_32BIT_WORD
    cmplx_samp_per_word = (BITS_PER_32BIT_WORD/(2*bits_per_sample))
    cmplx_samples = 
unpack_vdr_data(num_32bit_words,cmplx_samp_per_word,data)
    del(data) # This makes no difference
    for j in range(0,ksamps_per_sec*1000/fft_size):
      index = int(j*fft_size)
      S = fft(cmplx_samples[index:index+fft_size])
      P += S*conjugate(S)
    del(cmplx_samples) # This makes no difference
  if (i % 20) == 0:
    gc.collect(0) # This makes no difference
  P /= nsecs
  sample_period = 1./ksamps_per_sec # kHz
  f = fftfreq(fft_size, d=sample_period)

What am I missing?

Best regards

Tom

p.s.  Many of you will see this twice, for which I apologize.



More information about the NumPy-Discussion mailing list