I have a feeling that, coming at this with a background in FORTRAN and C, I'm missing some subtlety, possibly of an OO nature. Basically, I'm looping over very large data arrays and memory usage just keeps growing even though I reuse the arrays. Below is a stripped down version of what I'm doing. You'll recognize it as gulping a great quantity of data (1 million complex samples), Fourier transforming these by 1000 sample blocks into spectra, coadding the spectra, and doing this 255 times, for a grand 1000 point total spectrum. At iteration 108 of the outer loop, I get a memory error. By then, according to 'top', ipython (or python) is using around 85% of 3.5 GB of memory.
P = zeros(fft_size) nsecs = 255 fft_size = 1000 for i in range(nsecs): header,data = get_raw_record(fd_in) num_bytes = len(data) label, reclen, recver, softver, spcid, vsrid, schanid, bits_per_sample, \ ksamps_per_sec, sdplr, prdx_dss_id, prdx_sc_id, prdx_pass_num, \ prdx_uplink_band,prdx_downlink_band, trk_mode, uplink_dss_id, ddc_lo, \ rf_to_if_lo, data_error, year, doy, sec, data_time_offset, frov, fro, \ frr, sfro,rf_freq, schan_accum_phase, (scpp0,scpp1,scpp2,scpp3), \ schan_label = header # ksamp_per_sec = 1e3, number of complex samples in 'data' = 1e6 num_32bit_words = len(data)*8/BITS_PER_32BIT_WORD cmplx_samp_per_word = (BITS_PER_32BIT_WORD/(2*bits_per_sample)) cmplx_samples = unpack_vdr_data(num_32bit_words,cmplx_samp_per_word,data) del(data) # This makes no difference for j in range(0,ksamps_per_sec*1000/fft_size): index = int(j*fft_size) S = fft(cmplx_samples[index:index+fft_size]) P += S*conjugate(S) del(cmplx_samples) # This makes no difference if (i % 20) == 0: gc.collect(0) # This makes no difference P /= nsecs sample_period = 1./ksamps_per_sec # kHz f = fftfreq(fft_size, d=sample_period)
I don't know, but I would suggest that you strip the example down further: instead of reading data from a file, use numpy.random.randn to generate fake data as needed. In other words, use only numpy functionsno readers, no unpackers. Put this minimal script into a file and run it from the command line, not in ipython. (Have you verified that you get the same result running a standalone script from the command line as running from ipython?) Put a memorymonitoring step inside, maybe at each outer loop iteration. You can use the matplotlib.cbook.report_memory function or similar:
def report_memory(i=0): # argument may go away 'return the memory consumed by process' from subprocess import Popen, PIPE pid = os.getpid() if sys.platform=='sunos5': a2 = Popen('ps p %d o osz' % pid, shell=True, stdout=PIPE).stdout.readlines() mem = int(a2[1].strip()) elif sys.platform.startswith('linux'): a2 = Popen('ps p %d o rss,sz' % pid, shell=True, stdout=PIPE).stdout.readlines() mem = int(a2[1].split()[1]) elif sys.platform.startswith('darwin'): a2 = Popen('ps p %d o rss,vsz' % pid, shell=True, stdout=PIPE).stdout.readlines() mem = int(a2[1].split()[0])
return mem
I'm suspecting the problem may be in your data reader and/or unpacker, not in the application of numpy functions. Also, ipython can confuse the issue by keeping references to objects. In any case, with a simpler test script and regular memory monitoring, it should be easier for you to track down the problem.
Eric Firing

Tom Kuiper