[Numpy-discussion] Reading a big netcdf file

Gökhan Sever gokhansever at gmail.com
Wed Aug 3 13:01:24 EDT 2011


Just a few extra tests on my side pushing the limits of my system memory:

In [34]: k = np.zeros((21601, 10801, 3), dtype='int16')
k          ndarray     21601x10801x3: 699937203 elems, type `int16`,
1399874406 bytes (1335 Mb)

And for the first time my memory explodes with a hard kernel crash:

In [36]: k = np.zeros((21601, 10801, 13), dtype='int16')

Message from syslogd at ccn at Aug  3 10:51:43 ...
 kernel:[48715.531155] ------------[ cut here ]------------

Message from syslogd at ccn at Aug  3 10:51:43 ...
 kernel:[48715.531163] invalid opcode: 0000 [#1] SMP

Message from syslogd at ccn at Aug  3 10:51:43 ...
 kernel:[48715.531166] last sysfs file:
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map

Message from syslogd at ccn at Aug  3 10:51:43 ...
 kernel:[48715.531253] Stack:

Message from syslogd at ccn at Aug  3 10:51:43 ...
 kernel:[48715.531265] Call Trace:

Message from syslogd at ccn at Aug  3 10:51:43 ...
 kernel:[48715.531332] Code: be 33 01 00 00 48 89 fb 48 c7 c7 67 31 7a 81 e8
b0 2d f1 ff e8 90 f2 33 00 48 89 df e8 86 db 00 00 48 83 bb 60 01 00 00 00
74 02 <0f> 0b 48 8b 83 10 02 00 00 a8 20 75 02 0f 0b a8 40 74 02 0f 0b


On Wed, Aug 3, 2011 at 10:46 AM, Gökhan Sever <gokhansever at gmail.com> wrote:

> Here are my values for your comparison:
>
> test.nc file is about 715 MB. The details are below:
>
> In [21]: netCDF4.__version__
> Out[21]: '0.9.4'
>
> In [22]: np.__version__
> Out[22]: '2.0.0.dev-b233716'
>
> In [23]: from netCDF4 import Dataset
>
> In [24]: f = Dataset("test.nc")
>
> In [25]: f.variables['reflectivity'].shape
> Out[25]: (6, 18909, 506)
>
> In [26]: f.variables['reflectivity'].size
> Out[26]: 57407724
>
> In [27]: f.variables['reflectivity'][:].dtype
> Out[27]: dtype('float32')
>
> In [28]: timeit z = f.variables['reflectivity'][:]
> 1 loops, best of 3: 731 ms per loop
>
> How long it takes in your side to read that big array?
>
> On Wed, Aug 3, 2011 at 10:30 AM, Kiko <kikocorreoso at gmail.com> wrote:
>
>> Hi.
>>
>> I'm trying to read a big netcdf file (445 Mb) using netcdf4-python.
>>
>> The data are described as:
>> *The GEBCO gridded data set is stored in NetCDF as a one dimensional
>> array of 2-byte signed integers that represent integer elevations in metres.
>>
>> The complete data set gives global coverage. It consists of 21601 x 10801
>> data values, one for each one minute of latitude and longitude for 233312401
>> points.
>> The data start at position 90°N, 180°W and are arranged in bands of 360
>> degrees x 60 points/degree + 1 = 21601 values. The data range eastward from
>> 180°W longitude to 180°E longitude, i.e. the 180° value is repeated.*
>>
>> The problem is that it is very slow (or I am quite newbie).
>>
>> Anyone has a suggestion to get these data in a numpy array in a faster
>> way?
>>
>> Thanks in advance.
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> --
> Gökhan
>



-- 
Gökhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110803/660779da/attachment.html>


More information about the NumPy-Discussion mailing list