[Matrix-SIG] HELP! How to access a large array

Min Xu minxu@scisun.sci.ccny.cuny.edu
Mon, 20 Sep 1999 11:57:41 -0400


> minxu@sci.ccny.cuny.edu writes:
>  > But is turns out that it takes minutes for the second part to read the data 
>  > from netcdf file. It is very sad since I want to increase the size of data 
>  > further.
>  > 
> 
> Along which dimension are you doing the subsampling. To read
> subsamples along the first (unlimited) dimension is very timeconsuming
> but not memory consuming. Do you see, that all the data is read in at
> the memory footprint? Or do you asume this because of the long time it 
> took. It should not read in all the data. Which version of the
> netcdf-library are you using? The 2.? is slower then the newer ones.
> 
I try to load the *whole* netcdf array in the second part as my first attempt. 
The netcdf array is:

['w', 'mf_i', 'mf_r'], double and their sizes are:
>>> f.mf_r.shape
(13, 18, 18, 38, 38)
>>> f.mf_i.shape
(13, 18, 18, 38, 38)
>>> f.wf.shape
(10, 13, 10, 10, 20, 20)

The codelet is here:
    def getdata(self):
        f = netcdf.NetCDFFile("gkxy.nc", "r")
        for m in self.c.MEASTYPE:
            print 'Allocating for', m
            start = time()
            self.m[m] = zeros(f.variables[self.mmap[(m, 
self.c.SAMPTYPE[0])]+'_r'].shape, 'D')
            print 'time elapsed:', time() - start
            for s in self.c.SAMPTYPE:
                print 'Reading meas for', m, s
                start = time()
                self.m[m].real = self.m[m].real + \
                                f.variables[self.mmap[(m, s)]+'_r'][:] 
                self.m[m].imaginary = self.m[m].imaginary + \
                                f.variables[self.mmap[(m, s)]+'_i'][:]
                print 'time elasped:', time() - start
                print 'Reading weight for', m, s
                start = time()
                self.w[(m, s)] = f.variables[self.wmap[(m, s)]][:]
                print 'time elapsed:', time() - start
        f.close()

What I got is:
>>> i.getdata()
Allocating for 1
time elapsed: 1.53462100029
Reading meas for 1 1
time elasped: 185.918686032
Reading weight for 1 1
time elapsed: 3.38862001896
>>> 

Although the size of meas and weight is comparable, the time ratio is about 
*60*. The memory of my SGI200 machine is 256M but CPU is idle (<1% busy) and 
waiting for SWAP (>97% ratio) most of time when reading meas. Some 20M free
memory is reported at the same time.

netcdf library is 3.4 and the python module is Hinsen's.

Can you tell me some reasons for that?

>  > Is there any way that I can read some subsets of one array stored in netcdf  
>  > file without loading the whole array? Does f.variables['var'][x, y, ...] avoid
>  > loading the whole 'var' array? 
>  > 
> This should already be possible? I deal with NetCDF-files of ca. 1Gb
> size, so this is working.
> 
I agree. Since the second part has to read hundred of times of different 
subsets of the netcdf array and almost covers the whole array, I wonder it may 
be better to read the whole array at once.

Do you have a good method to handle complex data in netcdf?



-- 
Min Xu				
City College of NY, CUNY		
Email:	mxu1@email.gc.cuny.edu
       	minxu@sci.ccny.cuny.edu	 
Tel:	(O) (212) 650-6865
	(O) (212) 650-5046
	(H) (212) 690-2119