Transparently reading complex arrays from netcdf4

Hi, I am using netCDF4 to store complex data using the recommended strategy of creating a compound data type with the real and imaginary parts. This all works well, but reading the data into a numpy array is a bit clumsy. Typically I do: nc = netCDF4.Dataset('my.nc') cplx_data = nc.groups['mygroup'].variables['cplx_stuff'][:].view('complex') which directly gives a nice complex numpy array. This is OK for small arrays, but is wasteful if I only need some chunks of the array because it reads all the data in, reducing the utility of the mmap feature of netCDF. I'm wondering if there is a better way to directly make a numpy array view that uses the netcdf variable's memory mapped buffer directly. Looking at the Variable class, there is no access to this buffer directly which could then be passed to np.ndarray(buffer=...). Any ideas of simple solutions to this problem? Thanks, Glenn

Hi Glenn, My usual strategy for this sort of thing is to make a light-weight wrapper class which reads and converts values when you access them. For example: class WrapComplex(object): def __init__(self, nc_var): self.nc_var = nc_var def __getitem__(self, item): return self.nc_var[item].view('complex') nc = netCDF4.Dataset('my.nc') cplx_data = WrapComplex(nc.groups['mygroup'].variables['cplx_stuff']) Now you can index cplx_data (e.g., cplx_data[:10]) and only the values you need will be read from disk and converted on the fly. Hope this helps! Cheers, Stephan On Sat, Mar 29, 2014 at 6:13 PM, G Jones <glenn.caltech@gmail.com> wrote:
Hi, I am using netCDF4 to store complex data using the recommended strategy of creating a compound data type with the real and imaginary parts. This all works well, but reading the data into a numpy array is a bit clumsy.
Typically I do:
nc = netCDF4.Dataset('my.nc') cplx_data = nc.groups['mygroup'].variables['cplx_stuff'][:].view('complex')
which directly gives a nice complex numpy array. This is OK for small arrays, but is wasteful if I only need some chunks of the array because it reads all the data in, reducing the utility of the mmap feature of netCDF.
I'm wondering if there is a better way to directly make a numpy array view that uses the netcdf variable's memory mapped buffer directly. Looking at the Variable class, there is no access to this buffer directly which could then be passed to np.ndarray(buffer=...).
Any ideas of simple solutions to this problem?
Thanks, Glenn
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi Stephan, Thanks for the reply. I was thinking of something along these lines but was hesitant because while this provides clean access to chunks of the data, you still have to remember to do cplx_data[:].mean() for example in the case that you want cplx_data.mean(). I was hoping to basically have all of the ndarray methods at hand without any indexing, but then also being smart about taking advantage of the mmap when possible. But perhaps your solution is the best compromise. Thanks again, Glenn On Mar 29, 2014 10:59 PM, "Stephan Hoyer" <shoyer@gmail.com> wrote:
Hi Glenn,
My usual strategy for this sort of thing is to make a light-weight wrapper class which reads and converts values when you access them. For example:
class WrapComplex(object): def __init__(self, nc_var): self.nc_var = nc_var
def __getitem__(self, item): return self.nc_var[item].view('complex')
nc = netCDF4.Dataset('my.nc') cplx_data = WrapComplex(nc.groups['mygroup'].variables['cplx_stuff'])
Now you can index cplx_data (e.g., cplx_data[:10]) and only the values you need will be read from disk and converted on the fly.
Hope this helps!
Cheers, Stephan
On Sat, Mar 29, 2014 at 6:13 PM, G Jones <glenn.caltech@gmail.com> wrote:
Hi, I am using netCDF4 to store complex data using the recommended strategy of creating a compound data type with the real and imaginary parts. This all works well, but reading the data into a numpy array is a bit clumsy.
Typically I do:
nc = netCDF4.Dataset('my.nc') cplx_data = nc.groups['mygroup'].variables['cplx_stuff'][:].view('complex')
which directly gives a nice complex numpy array. This is OK for small arrays, but is wasteful if I only need some chunks of the array because it reads all the data in, reducing the utility of the mmap feature of netCDF.
I'm wondering if there is a better way to directly make a numpy array view that uses the netcdf variable's memory mapped buffer directly. Looking at the Variable class, there is no access to this buffer directly which could then be passed to np.ndarray(buffer=...).
Any ideas of simple solutions to this problem?
Thanks, Glenn
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi Glenn, Here is a full example of how we wrap a netCDF4.Variable object, implementing all of its ndarray-like methods: https://github.com/akleeman/xray/blob/0c1a963be0542b7303dc875278f3b163a15429... The __array__ method would be the most relevant one for you: it means that numpy knows how to convert the wrapper array into a numpy.ndarray when you call np.mean(cplx_data). More generally, any function that calls np.asarray(cplx_data) will properly convert the values, which should include most functions from well-written libraries (including numpy and scipy). netCDF4.Variable doesn't currently have such an __array__ method, but it will in the next released version of the library. The quick and dirty hack to make all numpy methods work (now going beyond what the netCDF4 library implements) would be to add something like the following: def __getattr__(self, attr): return getattr(np.asarray(self), attr) But this is a little dangerous, since some methods might silently fail or give unpredictable results (e.g., those that modify data). It would be safer to list the methods you want to implement explicitly, or to just liberally use np.asarray. The later is generally a good practice when writing library code, anyways, to catch unusual ndarray subclasses like np.matrix. Stephan On Sat, Mar 29, 2014 at 8:42 PM, G Jones <glenn.caltech@gmail.com> wrote:
Hi Stephan, Thanks for the reply. I was thinking of something along these lines but was hesitant because while this provides clean access to chunks of the data, you still have to remember to do cplx_data[:].mean() for example in the case that you want cplx_data.mean().
I was hoping to basically have all of the ndarray methods at hand without any indexing, but then also being smart about taking advantage of the mmap when possible. But perhaps your solution is the best compromise.
Thanks again, Glenn On Mar 29, 2014 10:59 PM, "Stephan Hoyer" <shoyer@gmail.com> wrote:
Hi Glenn,
My usual strategy for this sort of thing is to make a light-weight wrapper class which reads and converts values when you access them. For example:
class WrapComplex(object): def __init__(self, nc_var): self.nc_var = nc_var
def __getitem__(self, item): return self.nc_var[item].view('complex')
nc = netCDF4.Dataset('my.nc') cplx_data = WrapComplex(nc.groups['mygroup'].variables['cplx_stuff'])
Now you can index cplx_data (e.g., cplx_data[:10]) and only the values you need will be read from disk and converted on the fly.
Hope this helps!
Cheers, Stephan
On Sat, Mar 29, 2014 at 6:13 PM, G Jones <glenn.caltech@gmail.com> wrote:
Hi, I am using netCDF4 to store complex data using the recommended strategy of creating a compound data type with the real and imaginary parts. This all works well, but reading the data into a numpy array is a bit clumsy.
Typically I do:
nc = netCDF4.Dataset('my.nc') cplx_data = nc.groups['mygroup'].variables['cplx_stuff'][:].view('complex')
which directly gives a nice complex numpy array. This is OK for small arrays, but is wasteful if I only need some chunks of the array because it reads all the data in, reducing the utility of the mmap feature of netCDF.
I'm wondering if there is a better way to directly make a numpy array view that uses the netcdf variable's memory mapped buffer directly. Looking at the Variable class, there is no access to this buffer directly which could then be passed to np.ndarray(buffer=...).
Any ideas of simple solutions to this problem?
Thanks, Glenn
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi, This looks useful. What you said about __array__ makes sense, but I didn't see it in the code you linked. Do you know when python netcdf4 will support the numpy array interface directly? I searched around for a roadmap but didn't find anything. It may be best for me to proceed with a slightly clumsy interface for now and wait until the array interface is built in for free. Thanks, Glenn On Mar 30, 2014 2:18 AM, "Stephan Hoyer" <shoyer@gmail.com> wrote:
Hi Glenn,
Here is a full example of how we wrap a netCDF4.Variable object, implementing all of its ndarray-like methods:
https://github.com/akleeman/xray/blob/0c1a963be0542b7303dc875278f3b163a15429...
The __array__ method would be the most relevant one for you: it means that numpy knows how to convert the wrapper array into a numpy.ndarray when you call np.mean(cplx_data). More generally, any function that calls np.asarray(cplx_data) will properly convert the values, which should include most functions from well-written libraries (including numpy and scipy). netCDF4.Variable doesn't currently have such an __array__ method, but it will in the next released version of the library.
The quick and dirty hack to make all numpy methods work (now going beyond what the netCDF4 library implements) would be to add something like the following:
def __getattr__(self, attr): return getattr(np.asarray(self), attr)
But this is a little dangerous, since some methods might silently fail or give unpredictable results (e.g., those that modify data). It would be safer to list the methods you want to implement explicitly, or to just liberally use np.asarray. The later is generally a good practice when writing library code, anyways, to catch unusual ndarray subclasses like np.matrix.
Stephan
On Sat, Mar 29, 2014 at 8:42 PM, G Jones <glenn.caltech@gmail.com> wrote:
Hi Stephan, Thanks for the reply. I was thinking of something along these lines but was hesitant because while this provides clean access to chunks of the data, you still have to remember to do cplx_data[:].mean() for example in the case that you want cplx_data.mean().
I was hoping to basically have all of the ndarray methods at hand without any indexing, but then also being smart about taking advantage of the mmap when possible. But perhaps your solution is the best compromise.
Thanks again, Glenn On Mar 29, 2014 10:59 PM, "Stephan Hoyer" <shoyer@gmail.com> wrote:
Hi Glenn,
My usual strategy for this sort of thing is to make a light-weight wrapper class which reads and converts values when you access them. For example:
class WrapComplex(object): def __init__(self, nc_var): self.nc_var = nc_var
def __getitem__(self, item): return self.nc_var[item].view('complex')
nc = netCDF4.Dataset('my.nc') cplx_data = WrapComplex(nc.groups['mygroup'].variables['cplx_stuff'])
Now you can index cplx_data (e.g., cplx_data[:10]) and only the values you need will be read from disk and converted on the fly.
Hope this helps!
Cheers, Stephan
On Sat, Mar 29, 2014 at 6:13 PM, G Jones <glenn.caltech@gmail.com>wrote:
Hi, I am using netCDF4 to store complex data using the recommended strategy of creating a compound data type with the real and imaginary parts. This all works well, but reading the data into a numpy array is a bit clumsy.
Typically I do:
nc = netCDF4.Dataset('my.nc') cplx_data = nc.groups['mygroup'].variables['cplx_stuff'][:].view('complex')
which directly gives a nice complex numpy array. This is OK for small arrays, but is wasteful if I only need some chunks of the array because it reads all the data in, reducing the utility of the mmap feature of netCDF.
I'm wondering if there is a better way to directly make a numpy array view that uses the netcdf variable's memory mapped buffer directly. Looking at the Variable class, there is no access to this buffer directly which could then be passed to np.ndarray(buffer=...).
Any ideas of simple solutions to this problem?
Thanks, Glenn
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi Glenn, Here is the line in my linked code defining the __array__ method: https://github.com/akleeman/xray/blob/0c1a963be0542b7303dc875278f3b163a15429... I don't know when Jeff Whitaker will be releasing the next version of netCDF4, but I expect that might be pretty soon if you asked nicely! Otherwise you can always download the development version off of github: https://github.com/Unidata/netcdf4-python Cheers, Stephan On Sun, Mar 30, 2014 at 5:18 AM, G Jones <glenn.caltech@gmail.com> wrote:
Hi, This looks useful. What you said about __array__ makes sense, but I didn't see it in the code you linked. Do you know when python netcdf4 will support the numpy array interface directly? I searched around for a roadmap but didn't find anything. It may be best for me to proceed with a slightly clumsy interface for now and wait until the array interface is built in for free.
Thanks, Glenn On Mar 30, 2014 2:18 AM, "Stephan Hoyer" <shoyer@gmail.com> wrote:
Hi Glenn,
Here is a full example of how we wrap a netCDF4.Variable object, implementing all of its ndarray-like methods:
https://github.com/akleeman/xray/blob/0c1a963be0542b7303dc875278f3b163a15429...
The __array__ method would be the most relevant one for you: it means that numpy knows how to convert the wrapper array into a numpy.ndarray when you call np.mean(cplx_data). More generally, any function that calls np.asarray(cplx_data) will properly convert the values, which should include most functions from well-written libraries (including numpy and scipy). netCDF4.Variable doesn't currently have such an __array__ method, but it will in the next released version of the library.
The quick and dirty hack to make all numpy methods work (now going beyond what the netCDF4 library implements) would be to add something like the following:
def __getattr__(self, attr): return getattr(np.asarray(self), attr)
But this is a little dangerous, since some methods might silently fail or give unpredictable results (e.g., those that modify data). It would be safer to list the methods you want to implement explicitly, or to just liberally use np.asarray. The later is generally a good practice when writing library code, anyways, to catch unusual ndarray subclasses like np.matrix.
Stephan
On Sat, Mar 29, 2014 at 8:42 PM, G Jones <glenn.caltech@gmail.com> wrote:
Hi Stephan, Thanks for the reply. I was thinking of something along these lines but was hesitant because while this provides clean access to chunks of the data, you still have to remember to do cplx_data[:].mean() for example in the case that you want cplx_data.mean().
I was hoping to basically have all of the ndarray methods at hand without any indexing, but then also being smart about taking advantage of the mmap when possible. But perhaps your solution is the best compromise.
Thanks again, Glenn On Mar 29, 2014 10:59 PM, "Stephan Hoyer" <shoyer@gmail.com> wrote:
Hi Glenn,
My usual strategy for this sort of thing is to make a light-weight wrapper class which reads and converts values when you access them. For example:
class WrapComplex(object): def __init__(self, nc_var): self.nc_var = nc_var
def __getitem__(self, item): return self.nc_var[item].view('complex')
nc = netCDF4.Dataset('my.nc') cplx_data = WrapComplex(nc.groups['mygroup'].variables['cplx_stuff'])
Now you can index cplx_data (e.g., cplx_data[:10]) and only the values you need will be read from disk and converted on the fly.
Hope this helps!
Cheers, Stephan
On Sat, Mar 29, 2014 at 6:13 PM, G Jones <glenn.caltech@gmail.com>wrote:
Hi, I am using netCDF4 to store complex data using the recommended strategy of creating a compound data type with the real and imaginary parts. This all works well, but reading the data into a numpy array is a bit clumsy.
Typically I do:
nc = netCDF4.Dataset('my.nc') cplx_data = nc.groups['mygroup'].variables['cplx_stuff'][:].view('complex')
which directly gives a nice complex numpy array. This is OK for small arrays, but is wasteful if I only need some chunks of the array because it reads all the data in, reducing the utility of the mmap feature of netCDF.
I'm wondering if there is a better way to directly make a numpy array view that uses the netcdf variable's memory mapped buffer directly. Looking at the Variable class, there is no access to this buffer directly which could then be passed to np.ndarray(buffer=...).
Any ideas of simple solutions to this problem?
Thanks, Glenn
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (2)
-
G Jones
-
Stephan Hoyer