
Greetings all.
I have a feeling that, coming at this with a background in FORTRAN and C, I'm missing some subtlety, possibly of an OO nature. Basically, I'm looping over very large data arrays and memory usage just keeps growing even though I re-use the arrays. Below is a stripped down version of what I'm doing. You'll recognize it as gulping a great quantity of data (1 million complex samples), Fourier transforming these by 1000 sample blocks into spectra, co-adding the spectra, and doing this 255 times, for a grand 1000 point total spectrum. At iteration 108 of the outer loop, I get a memory error. By then, according to 'top', ipython (or python) is using around 85% of 3.5 GB of memory.
P = zeros(fft_size) nsecs = 255 fft_size = 1000 for i in range(nsecs): header,data = get_raw_record(fd_in) num_bytes = len(data) label, reclen, recver, softver, spcid, vsrid, schanid, bits_per_sample, \ ksamps_per_sec, sdplr, prdx_dss_id, prdx_sc_id, prdx_pass_num, \ prdx_uplink_band,prdx_downlink_band, trk_mode, uplink_dss_id, ddc_lo, \ rf_to_if_lo, data_error, year, doy, sec, data_time_offset, frov, fro, \ frr, sfro,rf_freq, schan_accum_phase, (scpp0,scpp1,scpp2,scpp3), \ schan_label = header # ksamp_per_sec = 1e3, number of complex samples in 'data' = 1e6 num_32bit_words = len(data)*8/BITS_PER_32BIT_WORD cmplx_samp_per_word = (BITS_PER_32BIT_WORD/(2*bits_per_sample)) cmplx_samples = unpack_vdr_data(num_32bit_words,cmplx_samp_per_word,data) del(data) # This makes no difference for j in range(0,ksamps_per_sec*1000/fft_size): index = int(j*fft_size) S = fft(cmplx_samples[index:index+fft_size]) P += S*conjugate(S) del(cmplx_samples) # This makes no difference if (i % 20) == 0: gc.collect(0) # This makes no difference P /= nsecs sample_period = 1./ksamps_per_sec # kHz f = fftfreq(fft_size, d=sample_period)
What am I missing?
Best regards
Tom
p.s. Many of you will see this twice, for which I apologize.

On 06/06/2010 02:17 PM, Tom Kuiper wrote:
Greetings all.
I have a feeling that, coming at this with a background in FORTRAN and C, I'm missing some subtlety, possibly of an OO nature. Basically, I'm looping over very large data arrays and memory usage just keeps growing even though I re-use the arrays. Below is a stripped down version of what I'm doing. You'll recognize it as gulping a great quantity of data (1 million complex samples), Fourier transforming these by 1000 sample blocks into spectra, co-adding the spectra, and doing this 255 times, for a grand 1000 point total spectrum. At iteration 108 of the outer loop, I get a memory error. By then, according to 'top', ipython (or python) is using around 85% of 3.5 GB of memory.
P = zeros(fft_size)
nsecs = 255 fft_size = 1000 for i in range(nsecs): header,data = get_raw_record(fd_in) num_bytes = len(data) label, reclen, recver, softver, spcid, vsrid, schanid, bits_per_sample, \ ksamps_per_sec, sdplr, prdx_dss_id, prdx_sc_id, prdx_pass_num, \ prdx_uplink_band,prdx_downlink_band, trk_mode, uplink_dss_id, ddc_lo, \ rf_to_if_lo, data_error, year, doy, sec, data_time_offset, frov, fro, \ frr, sfro,rf_freq, schan_accum_phase, (scpp0,scpp1,scpp2,scpp3), \ schan_label = header # ksamp_per_sec = 1e3, number of complex samples in 'data' = 1e6 num_32bit_words = len(data)*8/BITS_PER_32BIT_WORD cmplx_samp_per_word = (BITS_PER_32BIT_WORD/(2*bits_per_sample)) cmplx_samples = unpack_vdr_data(num_32bit_words,cmplx_samp_per_word,data) del(data) # This makes no difference for j in range(0,ksamps_per_sec*1000/fft_size): index = int(j*fft_size) S = fft(cmplx_samples[index:index+fft_size]) P += S*conjugate(S) del(cmplx_samples) # This makes no difference if (i % 20) == 0: gc.collect(0) # This makes no difference P /= nsecs sample_period = 1./ksamps_per_sec # kHz f = fftfreq(fft_size, d=sample_period)
What am I missing?
I don't know, but I would suggest that you strip the example down further: instead of reading data from a file, use numpy.random.randn to generate fake data as needed. In other words, use only numpy functions--no readers, no unpackers. Put this minimal script into a file and run it from the command line, not in ipython. (Have you verified that you get the same result running a standalone script from the command line as running from ipython?) Put a memory-monitoring step inside, maybe at each outer loop iteration. You can use the matplotlib.cbook.report_memory function or similar:
def report_memory(i=0): # argument may go away 'return the memory consumed by process' from subprocess import Popen, PIPE pid = os.getpid() if sys.platform=='sunos5': a2 = Popen('ps -p %d -o osz' % pid, shell=True, stdout=PIPE).stdout.readlines() mem = int(a2[-1].strip()) elif sys.platform.startswith('linux'): a2 = Popen('ps -p %d -o rss,sz' % pid, shell=True, stdout=PIPE).stdout.readlines() mem = int(a2[1].split()[1]) elif sys.platform.startswith('darwin'): a2 = Popen('ps -p %d -o rss,vsz' % pid, shell=True, stdout=PIPE).stdout.readlines() mem = int(a2[1].split()[0])
return mem
I'm suspecting the problem may be in your data reader and/or unpacker, not in the application of numpy functions. Also, ipython can confuse the issue by keeping references to objects. In any case, with a simpler test script and regular memory monitoring, it should be easier for you to track down the problem.
Eric
Best regards
Tom
p.s. Many of you will see this twice, for which I apologize. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (2)
-
Eric Firing
-
Tom Kuiper