Mailman 3 Re: [Numpy-discussion] Home for pyhdf5io? - NumPy-Discussion

newer
Can you teach me how to used array...

Re: [Numpy-discussion] Home for pyhdf5io?

older
Re: [Numpy-discussion] add xirr to...

Francesc Alted

22 May 2009 22 May '09

8 a.m.

Hello Albert, A Thursday 21 May 2009 22:32:10 escriguéreu:

...

Hi, First of all thanks for your work on PyTables! I think it is excellent and it has been really nice working with it.

I'm writing this because i have developed a small python module that uses pyTables:

http://code.google.com/p/pyhdf5io/

It basically implements load/save functions that mimic the behaviour of those found in Matlab, i.e. with them you can store your variables from within the interactive shell (IPython, python) or from within a function, and then load them back in again.

I've been having a look at your module and seems pretty cute. Incidentally, there is another module module that does similar things: http://www.elisanet.fi/ptvirtan/software/hdf5pickle/index.html However, I do like your package better in the sense that it adds more 'magic' to the load/save routines. But maybe you want to have a look at the above: it can give you more ideas, like for example, using CArrays and compression for very large arrays, or Tables for structured arrays.

...

And now to the question:

I think that this module is to small to be developed and maintained on its own. I think It would be better if it could be part of some larger project, maybe pyTables, I don't know.

Sure. I think it could fit perfectly as a module inside PyTables, in the same wave than 'filenode' and 'netcdf3'. Most of your module can be dropped as-is into the PyTables module hierarchy. However, it would be nice if you can write some documentation following the format of User's Guide chapters like the ones about 'filenode' or 'netcdf3' modules. Please, let's continue the discussion in the PyTables list in case we need to. Thanks for contributing! -- Francesc Alted

Show replies by date

Pauli Virtanen

22 May 22 May

10:24 a.m.

New subject: Home for pyhdf5io?

Fri, 22 May 2009 10:00:56 +0200, Francesc Alted kirjoitti: [clip: pyhdf5io]

...

I've been having a look at your module and seems pretty cute. Incidentally, there is another module module that does similar things:

http://www.elisanet.fi/ptvirtan/software/hdf5pickle/index.html

However, I do like your package better in the sense that it adds more 'magic' to the load/save routines. But maybe you want to have a look at the above: it can give you more ideas, like for example, using CArrays and compression for very large arrays, or Tables for structured arrays.

I don't think these two are really comparable. The significant difference appears to be that pyhdf5io is a thin wrapper for File.createArray, so when it encounters non-array objects, it will pickle them to strings, and save the strings to the HDF5 file. Hdf5pickle, OTOH, implements the pickle protocol, and will unwrap non- array objects so that all their attributes etc. are exposed in the hdf5 file and can be read by non-Python applications. -- Pauli Virtanen

Albert Thuswaldner

23 May 23 May

9:36 a.m.

New subject: Home for pyhdf5io?

Thank you pauli for your input. I agree with you that our projects have different goals, even if they touch on the same subject. The goal of pyhdf5io is to provide a very simple interface, so that the user can save his/her data. The reasone for picking hdf5 was of course also to enable the use of the data in other programs. Here i mostly tought on simple nummerical data, arrays etc. Not so much on complex datatypes that python provides. So i guess in the long term i have to also add pickling support. In the short term i will add warnings for the data types that are not supported. /Albert 2009/5/22, Pauli Virtanen <pav@iki.fi>:

...

Fri, 22 May 2009 10:00:56 +0200, Francesc Alted kirjoitti: [clip: pyhdf5io]

...
I've been having a look at your module and seems pretty cute. Incidentally, there is another module module that does similar things:

http://www.elisanet.fi/ptvirtan/software/hdf5pickle/index.html

However, I do like your package better in the sense that it adds more 'magic' to the load/save routines. But maybe you want to have a look at the above: it can give you more ideas, like for example, using CArrays and compression for very large arrays, or Tables for structured arrays.

I don't think these two are really comparable. The significant difference appears to be that pyhdf5io is a thin wrapper for File.createArray, so when it encounters non-array objects, it will pickle them to strings, and save the strings to the HDF5 file.

Hdf5pickle, OTOH, implements the pickle protocol, and will unwrap non- array objects so that all their attributes etc. are exposed in the hdf5 file and can be read by non-Python applications.

-- Pauli Virtanen

_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- Skickat från min mobila enhet

David Warde-Farley

11:47 a.m.

New subject: Home for pyhdf5io?

On 23-May-09, at 5:36 AM, Albert Thuswaldner wrote:

...

So i guess in the long term i have to also add pickling support. In the short term i will add warnings for the data types that are not supported.

In order to ensure optimal division of labour, I'd suggest simply basing your pickling support on hdf5pickle, and including it as an optional dependency, that you detect at runtime (just put the import in a try block and catch the ImportError). If you have hdf5pickle installed, pyhdf5io will pickle any objects you try to use save() with, etc. Otherwise it will just work the way it does now. I think that satisfies the goals of your project as being a thin wrapper that provides a simple interface, rather than reinventing the wheel by re-implementing hdf5 pickling. It also means that there aren't two, maybe-incompatible ways to pickle an object in HDF5 -- just one (even if you write your implementation to be compatible with Pauli's, there's opportunity for the codebases to diverge over time). David

Albert Thuswaldner

8:25 p.m.

New subject: Home for pyhdf5io?

Actually my vision with pyhdf5io is to have hdf5 to replace numpy's own binary file format (.npy, npz). Pyhdf5io (or an incarnation of it) should be the standard (binary) way to store data in scipy/numpy. A bold statement, I know, but I think that it would be an improvement, especially for those users how are replacing Matlab with sicpy/numpy. I don't know if this vision of mine is possible to realize, or even something that is shared by anyone else in the community. So list what are your thoughts? Are there some people out there, who like the idea and want to cooperate in realizing it? /Albert On Sat, May 23, 2009 at 13:47, David Warde-Farley <dwf@cs.toronto.edu> wrote:

...

On 23-May-09, at 5:36 AM, Albert Thuswaldner wrote:

...
So i guess in the long term i have to also add pickling support. In the short term i will add warnings for the data types that are not supported.

In order to ensure optimal division of labour, I'd suggest simply basing your pickling support on hdf5pickle, and including it as an optional dependency, that you detect at runtime (just put the import in a try block and catch the ImportError). If you have hdf5pickle installed, pyhdf5io will pickle any objects you try to use save() with, etc. Otherwise it will just work the way it does now.

I think that satisfies the goals of your project as being a thin wrapper that provides a simple interface, rather than reinventing the wheel by re-implementing hdf5 pickling. It also means that there aren't two, maybe-incompatible ways to pickle an object in HDF5 -- just one (even if you write your implementation to be compatible with Pauli's, there's opportunity for the codebases to diverge over time).

David _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Charles R Harris

8:29 p.m.

New subject: Home for pyhdf5io?

On Sat, May 23, 2009 at 2:25 PM, Albert Thuswaldner < albert.thuswaldner@gmail.com> wrote:

...

Actually my vision with pyhdf5io is to have hdf5 to replace numpy's own binary file format (.npy, npz). Pyhdf5io (or an incarnation of it) should be the standard (binary) way to store data in scipy/numpy. A bold statement, I know, but I think that it would be an improvement, especially for those users how are replacing Matlab with sicpy/numpy.

I don't know if this vision of mine is possible to realize, or even something that is shared by anyone else in the community. So list what are your thoughts? Are there some people out there, who like the idea and want to cooperate in realizing it?

I rather like the idea, but having a dependency on hdf5/pytables might be a bit much. <snip> Chuck

David Warde-Farley

8:41 p.m.

New subject: Home for pyhdf5io?

On 23-May-09, at 4:25 PM, Albert Thuswaldner wrote:

...

Actually my vision with pyhdf5io is to have hdf5 to replace numpy's own binary file format (.npy, npz). Pyhdf5io (or an incarnation of it) should be the standard (binary) way to store data in scipy/numpy. A bold statement, I know, but I think that it would be an improvement, especially for those users how are replacing Matlab with sicpy/numpy.

In that it introduces a dependency on pytables (and the hdf5 C library) I doubt it would be something the numpy core developers would be eager to adopt. The npy and npz formats (as best I can gather) exist so that there is _some_ way of persisting data to disk that ships with numpy. It's not meant necessarily as the best way, or as an interchange format, just as something that works "out of the box", the code for which is completely contained within numpy. It might be worth mentioning the limitations of numpy's built-in save(), savez() and load() in the docstrings and recommending more portable alternatives, though. David

Robert Kern

9:02 p.m.

New subject: Home for pyhdf5io?

On Sat, May 23, 2009 at 15:41, David Warde-Farley <dwf@cs.toronto.edu> wrote:

...

On 23-May-09, at 4:25 PM, Albert Thuswaldner wrote:

...
Actually my vision with pyhdf5io is to have hdf5 to replace numpy's own binary file format (.npy, npz). Pyhdf5io (or an incarnation of it) should be the standard (binary) way to store data in scipy/numpy. A bold statement, I know, but I think that it would be an improvement, especially for those users how are replacing Matlab with sicpy/numpy.

In that it introduces a dependency on pytables (and the hdf5 C library) I doubt it would be something the numpy core developers would be eager to adopt.

The npy and npz formats (as best I can gather) exist so that there is _some_ way of persisting data to disk that ships with numpy. It's not meant necessarily as the best way, or as an interchange format, just as something that works "out of the box", the code for which is completely contained within numpy.

Yes. The full set of use cases and design constraints are considered here: http://svn.scipy.org/svn/numpy/trunk/doc/neps/npy-format.txt -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Stephen Simmons

24 May 24 May

12:23 p.m.

New subject: Home for pyhdf5io?

David Warde-Farley wrote:

...

On 23-May-09, at 4:25 PM, Albert Thuswaldner wrote:

...
Actually my vision with pyhdf5io is to have hdf5 to replace numpy's own binary file format (.npy, npz). Pyhdf5io (or an incarnation of it) should be the standard (binary) way to store data in scipy/numpy. A bold statement, I know, but I think that it would be an improvement, especially for those users how are replacing Matlab with sicpy/numpy.

In that it introduces a dependency on pytables (and the hdf5 C library) I doubt it would be something the numpy core developers would be eager to adopt.

The npy and npz formats (as best I can gather) exist so that there is _some_ way of persisting data to disk that ships with numpy. It's not meant necessarily as the best way, or as an interchange format, just as something that works "out of the box", the code for which is completely contained within numpy.

It might be worth mentioning the limitations of numpy's built-in save(), savez() and load() in the docstrings and recommending more portable alternatives, though.

David

I tend to agree with David that PyTables is too big a dependency for inclusion in core Numpy. It does a lot more than simply loading and saving arrays. While I haven't tried Andrew Collette's h5py (http://code.google.com/p/h5py), it looks like a very 'thin' wrapper around the HDF5 C libraries. Maybe numpy's save(), savez(), load(), memmap() could be enhanced so that saving/loading files with HDF5-like file extensions used the HDF5 format, with code based on h5py and pyhdf5io. This could, I imagine, be a relatively small/simple addition to numpy, with the only external dependency being the HDF5 libraries themselves. Stephen

Robert Kern

9:22 p.m.

New subject: Home for pyhdf5io?

On Sun, May 24, 2009 at 07:23, Stephen Simmons <mail@stevesimmons.com> wrote:

...

While I haven't tried Andrew Collette's h5py (http://code.google.com/p/h5py), it looks like a very 'thin' wrapper around the HDF5 C libraries. Maybe numpy's save(), savez(), load(), memmap() could be enhanced so that saving/loading files with HDF5-like file extensions used the HDF5 format, with code based on h5py and pyhdf5io. This could, I imagine, be a relatively small/simple addition to numpy, with the only external dependency being the HDF5 libraries themselves.

*libhdf5* is too big, not PyTables. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

David Warde-Farley

10:31 p.m.

New subject: Home for pyhdf5io?

On 24-May-09, at 5:22 PM, Robert Kern wrote:

...

...
While I haven't tried Andrew Collette's h5py (http://code.google.com/p/h5py), it looks like a very 'thin' wrapper around the HDF5 C libraries. Maybe numpy's save(), savez(), load(), memmap() could be enhanced so that saving/loading files with HDF5- like file extensions used the HDF5 format, with code based on h5py and pyhdf5io. This could, I imagine, be a relatively small/simple addition to numpy, with the only external dependency being the HDF5 libraries themselves.

*libhdf5* is too big, not PyTables.

Yup. According to sloccount, numpy is roughly ~210,000 lines of code. The hdf5 library is ~385,000 lines. Including even a small part of libhdf5 would grow the code base significantly, and requiring it as a dependency isn't a good idea since libhdf5 can be tricky to build right. As Robert's design document for the NPY format says, one option would be to implement a minimal subset of the HDF5 protocol *from scratch* (that would be required for saving NumPy arrays as top-level leaf nodes, for example). This would also sidestep any tricky licensing issues (I don't know what the HDF5 license is in particular, I know it's fairly permissive but still might not be suitable for including any of it in NumPy). David

Francesc Alted

25 May 25 May

6:57 a.m.

New subject: Home for pyhdf5io?

A Monday 25 May 2009 00:31:43 David Warde-Farley escrigué:

...

As Robert's design document for the NPY format says, one option would be to implement a minimal subset of the HDF5 protocol *from scratch* (that would be required for saving NumPy arrays as top-level leaf nodes, for example). This would also sidestep any tricky licensing issues (I don't know what the HDF5 license is in particular, I know it's fairly permissive but still might not be suitable for including any of it in NumPy).

The license for HDF5 is BSD-based and apparently permissive enough, as can be seen in: http://www.hdfgroup.org/HDF5/doc/Copyright.html The problem is to select such a desired minimal protocol subset. In addition, this implementation may require quite a bit of work (but I've never had an in- deep look at the guts of the HDF5 library, so I may be wrong). Cheers, -- Francesc Alted

Eric Firing

5:19 p.m.

New subject: Home for pyhdf5io?

Francesc Alted wrote:

...

A Monday 25 May 2009 00:31:43 David Warde-Farley escrigué:

...
As Robert's design document for the NPY format says, one option would be to implement a minimal subset of the HDF5 protocol *from scratch* (that would be required for saving NumPy arrays as top-level leaf nodes, for example). This would also sidestep any tricky licensing issues (I don't know what the HDF5 license is in particular, I know it's fairly permissive but still might not be suitable for including any of it in NumPy).

The license for HDF5 is BSD-based and apparently permissive enough, as can be seen in:

http://www.hdfgroup.org/HDF5/doc/Copyright.html

The problem is to select such a desired minimal protocol subset. In addition, this implementation may require quite a bit of work (but I've never had an in- deep look at the guts of the HDF5 library, so I may be wrong).

Cheers,

If the aim is to come up with a method of saving numpy arrays that uses a standard protocol and does not introduce large dependencies, then could this be accomplished using netcdf instead of hdf5, specifically Roberto De Almeida's pupynere, which is already in scipy.io as netcdf.py? Or does hdf5 have essential characteristics for this purpose that netcdf lacks? Eric

Albert Thuswaldner

5:39 p.m.

New subject: Home for pyhdf5io?

...

From what I understand, netCFD is based on on HDF5, at least as of the version 4 release.

On Mon, May 25, 2009 at 19:19, Eric Firing <efiring@hawaii.edu> wrote:

...

Francesc Alted wrote:

...
A Monday 25 May 2009 00:31:43 David Warde-Farley escrigué:

...
As Robert's design document for the NPY format says, one option would be to implement a minimal subset of the HDF5 protocol *from scratch* (that would be required for saving NumPy arrays as top-level leaf nodes, for example). This would also sidestep any tricky licensing issues (I don't know what the HDF5 license is in particular, I know it's fairly permissive but still might not be suitable for including any of it in NumPy).

The license for HDF5 is BSD-based and apparently permissive enough, as can be seen in:

http://www.hdfgroup.org/HDF5/doc/Copyright.html

The problem is to select such a desired minimal protocol subset. In addition, this implementation may require quite a bit of work (but I've never had an in- deep look at the guts of the HDF5 library, so I may be wrong).

Cheers,

If the aim is to come up with a method of saving numpy arrays that uses a standard protocol and does not introduce large dependencies, then could this be accomplished using netcdf instead of hdf5, specifically Roberto De Almeida's pupynere, which is already in scipy.io as netcdf.py? Or does hdf5 have essential characteristics for this purpose that netcdf lacks?

Eric _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Eric Firing

5:55 p.m.

New subject: Home for pyhdf5io?

Albert Thuswaldner wrote:

...

...
From what I understand, netCFD is based on on HDF5, at least as of the version 4 release.

Netcdf4 is indeed built on hdf5, but netcdf3 is not, and netcdf3 format is likely to stick around for a *very* long time. The netcdf4 library is backwards-compatible with netcdf3. Eric

...

On Mon, May 25, 2009 at 19:19, Eric Firing <efiring@hawaii.edu> wrote:

...
Francesc Alted wrote:

...
A Monday 25 May 2009 00:31:43 David Warde-Farley escrigué:

...
As Robert's design document for the NPY format says, one option would be to implement a minimal subset of the HDF5 protocol *from scratch* (that would be required for saving NumPy arrays as top-level leaf nodes, for example). This would also sidestep any tricky licensing issues (I don't know what the HDF5 license is in particular, I know it's fairly permissive but still might not be suitable for including any of it in NumPy). The license for HDF5 is BSD-based and apparently permissive enough, as can be seen in:

http://www.hdfgroup.org/HDF5/doc/Copyright.html

The problem is to select such a desired minimal protocol subset. In addition, this implementation may require quite a bit of work (but I've never had an in- deep look at the guts of the HDF5 library, so I may be wrong).

Cheers,

If the aim is to come up with a method of saving numpy arrays that uses a standard protocol and does not introduce large dependencies, then could this be accomplished using netcdf instead of hdf5, specifically Roberto De Almeida's pupynere, which is already in scipy.io as netcdf.py? Or does hdf5 have essential characteristics for this purpose that netcdf lacks?

Eric _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Francesc Alted

6:05 p.m.

New subject: Home for pyhdf5io?

A Monday 25 May 2009 19:55:28 Eric Firing escrigué:

...

...
...
If the aim is to come up with a method of saving numpy arrays that uses a standard protocol and does not introduce large dependencies, then could this be accomplished using netcdf instead of hdf5, specifically Roberto De Almeida's pupynere, which is already in scipy.io as netcdf.py? Or does hdf5 have essential characteristics for this purpose that netcdf lacks?

After looking a bit at the code of pupynere, there is the next line: assert magic == 'CDF', "Error: %s is not a valid NetCDF 3 file" % self.filename So, the current version of pupynere is definitely for version 3 of NetCDF, not version 4.

...

...
...
From what I understand, netCFD is based on on HDF5, at least as of the

version 4 release.

Netcdf4 is indeed built on hdf5, but netcdf3 is not, and netcdf3 format is likely to stick around for a *very* long time. The netcdf4 library is backwards-compatible with netcdf3.

NetCDF4 is backwards-compatible with NetCDF3 just at API level, not the file format. NetCDF3 has a much more simple format, and completely different from NetCDF4, which is based on HDF5. Cheers, -- Francesc Alted

Eric Firing

6:29 p.m.

New subject: Home for pyhdf5io?

Francesc Alted wrote:

...

A Monday 25 May 2009 19:55:28 Eric Firing escrigué:

...
...
...
If the aim is to come up with a method of saving numpy arrays that uses a standard protocol and does not introduce large dependencies, then could this be accomplished using netcdf instead of hdf5, specifically Roberto De Almeida's pupynere, which is already in scipy.io as netcdf.py? Or does hdf5 have essential characteristics for this purpose that netcdf lacks?

After looking a bit at the code of pupynere, there is the next line:

assert magic == 'CDF', "Error: %s is not a valid NetCDF 3 file" % self.filename

So, the current version of pupynere is definitely for version 3 of NetCDF, not version 4.

Yes, and I presume it will stay that way--which is fine for the question I am asking above. I should have said "netcdf3" explicitly. Its simplicity compared to hdf5 and netcdf4 is potentially a virtue. The question is, is it *too* simple for the intended purpose?

...

...
...
...
From what I understand, netCFD is based on on HDF5, at least as of the

version 4 release. Netcdf4 is indeed built on hdf5, but netcdf3 is not, and netcdf3 format is likely to stick around for a *very* long time. The netcdf4 library is backwards-compatible with netcdf3.

NetCDF4 is backwards-compatible with NetCDF3 just at API level, not the file format. NetCDF3 has a much more simple format, and completely different from NetCDF4, which is based on HDF5.

Yes, but the netcdf4 *library* includes full netcdf3 compatibility; you can read and write netcdf3 using the netcdf4 library. For example, you can build Jeff Whitaker's http://code.google.com/p/netcdf4-python/ with all the hdf5 bells and whistles, and it will still happily read and, upon request, write netcdf3 files. Eric

...

Cheers,

Francesc Alted

7:17 p.m.

New subject: Home for pyhdf5io?

A Monday 25 May 2009 20:29:25 Eric Firing escrigué:

...

Francesc Alted wrote:

...
A Monday 25 May 2009 19:55:28 Eric Firing escrigué:

...
...
...
If the aim is to come up with a method of saving numpy arrays that uses a standard protocol and does not introduce large dependencies, then could this be accomplished using netcdf instead of hdf5, specifically Roberto De Almeida's pupynere, which is already in scipy.io as netcdf.py? Or does hdf5 have essential characteristics for this purpose that netcdf lacks?

After looking a bit at the code of pupynere, there is the next line:

assert magic == 'CDF', "Error: %s is not a valid NetCDF 3 file" % self.filename

So, the current version of pupynere is definitely for version 3 of NetCDF, not version 4.

Yes, and I presume it will stay that way--which is fine for the question I am asking above. I should have said "netcdf3" explicitly. Its simplicity compared to hdf5 and netcdf4 is potentially a virtue.

The question is, is it *too* simple for the intended purpose?

I don't think the question is whether a format would be too simple or not, but rather about file compatibility. In that sense HDF5 is emerging as a standard de facto, and many tools are acquiring the capability to read/write this format (e.g. Matlab, IDL, Octave, Mathematica, R, NetCDF4-based apps and many others). Having this interchange capability is what should be seen as desirable, IMO.

...

...
...
...
...
From what I understand, netCFD is based on on HDF5, at least as of the

version 4 release.

Netcdf4 is indeed built on hdf5, but netcdf3 is not, and netcdf3 format is likely to stick around for a *very* long time. The netcdf4 library is backwards-compatible with netcdf3.

NetCDF4 is backwards-compatible with NetCDF3 just at API level, not the file format. NetCDF3 has a much more simple format, and completely different from NetCDF4, which is based on HDF5.

Yes, but the netcdf4 *library* includes full netcdf3 compatibility; you can read and write netcdf3 using the netcdf4 library. For example, you can build Jeff Whitaker's http://code.google.com/p/netcdf4-python/ with all the hdf5 bells and whistles, and it will still happily read and, upon request, write netcdf3 files.

Again, I think that the issue is compatibility with other tools, not just between NetCDF3/NetCDF4 worlds. -- Francesc Alted

Christopher Barker

26 May 26 May

5:11 a.m.

New subject: Home for pyhdf5io?

David Warde-Farley wrote:

...

As Robert's design document for the NPY format says, one option would be to implement a minimal subset of the HDF5 protocol *from scratch*

That would be really cool -- I wonder how hard it would be to implement just the current NPY features? Judging from this: http://www.hdfgroup.org/HDF5/doc/H5.format.html It's far from trivial! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Robert Kern

5:15 a.m.

New subject: Home for pyhdf5io?

On Tue, May 26, 2009 at 00:11, Christopher Barker <Chris.Barker@noaa.gov> wrote:

...

David Warde-Farley wrote:

...
As Robert's design document for the NPY format says, one option would be to implement a minimal subset of the HDF5 protocol *from scratch*

That would be really cool -- I wonder how hard it would be to implement just the current NPY features? Judging from this:

http://www.hdfgroup.org/HDF5/doc/H5.format.html

It's far from trivial!

Yes. That's why I wrote the NPY format instead. I *did* do some due diligence before I designed a new binary format. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Christopher Barker

5:50 a.m.

New subject: Home for pyhdf5io?

Robert Kern wrote:

...

Yes. That's why I wrote the NPY format instead. I *did* do some due diligence before I designed a new binary format.

I assumed so, and I also assume you took a look at netcdf3, but since it's been brought up here, I take it it dint fit the bill? Even if it did, while it will be around for a LONG time, it is an out-of-date format. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Robert Kern

11:28 p.m.

New subject: Home for pyhdf5io?

On Tue, May 26, 2009 at 00:50, Christopher Barker <Chris.Barker@noaa.gov> wrote:

...

Robert Kern wrote:

...
Yes. That's why I wrote the NPY format instead. I *did* do some due diligence before I designed a new binary format.

I assumed so, and I also assume you took a look at netcdf3, but since it's been brought up here, I take it it dint fit the bill? Even if it did, while it will be around for a LONG time, it is an out-of-date format.

Lack of unsigned and 64-bit integers for the most part. But even if they were supported, I didn't see much point in using a standard that is being replaced by its own community. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Christopher Barker

27 May 27 May

12:05 a.m.

New subject: Home for pyhdf5io?

Robert Kern wrote:

...

On Tue, May 26, 2009 at 00:50, Christopher Barker <Chris.Barker@noaa.gov> wrote:

...
I assumed so, and I also assume you took a look at netcdf3, but since it's been brought up here, I take it it didn't fit the bill?

...

Lack of unsigned and 64-bit integers for the most part. But even if they were supported, I didn't see much point in using a standard that is being replaced by its own community.

I agree -- I, and many others, are using netcdf4 libs to work with netcdf3 files, but the change will come. So we have a technical and social reason not to use it. Good enough for me. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

David Warde-Farley

26 May 26 May

6:02 a.m.

New subject: Home for pyhdf5io?

On 26-May-09, at 1:15 AM, Robert Kern wrote:

...

I *did* do some due diligence before I designed a new binary format.

Uh oh, I feel this might've taken a sharp turn towards another "of course Robert is right, Robert is always right" threads. :) David

Francesc Alted

6:32 a.m.

New subject: Home for pyhdf5io?

A Tuesday 26 May 2009 08:02:43 David Warde-Farley escrigué:

...

On 26-May-09, at 1:15 AM, Robert Kern wrote:

...
I *did* do some due diligence before I designed a new binary format.

Uh oh, I feel this might've taken a sharp turn towards another "of course Robert is right, Robert is always right" threads. :)

Agreed :) -- Francesc Alted

Robert Kern

23 May 23 May

8:59 p.m.

New subject: Home for pyhdf5io?

On Sat, May 23, 2009 at 06:47, David Warde-Farley <dwf@cs.toronto.edu> wrote:

...

On 23-May-09, at 5:36 AM, Albert Thuswaldner wrote:

...
So i guess in the long term i have to also add pickling support. In the short term i will add warnings for the data types that are not supported.

In order to ensure optimal division of labour, I'd suggest simply basing your pickling support on hdf5pickle, and including it as an optional dependency, that you detect at runtime (just put the import in a try block and catch the ImportError). If you have hdf5pickle installed, pyhdf5io will pickle any objects you try to use save() with, etc. Otherwise it will just work the way it does now.

That would cause difficulties. Now the format of your data depends on whether or not you have a package installed. That's not a very good level of control. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

David Warde-Farley

9:47 p.m.

New subject: Home for pyhdf5io?

On 23-May-09, at 4:59 PM, Robert Kern wrote:

...

...
Otherwise it will just work the way it does now.

That would cause difficulties. Now the format of your data depends on whether or not you have a package installed. That's not a very good level of control.

Sorry, I wasn't clear. What I meant was, if hdf5pickle isn't detected you could just refuse to save anything that's not a numpy array. David

Robert Kern

9:56 p.m.

New subject: Home for pyhdf5io?

On Sat, May 23, 2009 at 16:47, David Warde-Farley <dwf@cs.toronto.edu> wrote:

...

On 23-May-09, at 4:59 PM, Robert Kern wrote:

...
...
Otherwise it will just work the way it does now.

That would cause difficulties. Now the format of your data depends on whether or not you have a package installed. That's not a very good level of control.

Sorry, I wasn't clear. What I meant was, if hdf5pickle isn't detected you could just refuse to save anything that's not a numpy array.

Ah, good. That makes much more sense. :-) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Albert Thuswaldner

24 May 24 May

8:53 a.m.

New subject: Home for pyhdf5io?

Ok, I understand that my thought on making hdf5 the standard save/load format for numpy was a bit naive. If it would have been easy it would already have been done. Thanks for the insights Robert. Well anyhow, I will continue with my little module and see where it goes. I will start a new thread in the pyTables list to discuss the steps needed to be taken to add pyhdf5io to the pyTables project. Thanks to everyone who took part in this discussion. /Albert On Sat, May 23, 2009 at 23:56, Robert Kern <robert.kern@gmail.com> wrote:

...

On Sat, May 23, 2009 at 16:47, David Warde-Farley <dwf@cs.toronto.edu> wrote:

...
On 23-May-09, at 4:59 PM, Robert Kern wrote:

...
...
Otherwise it will just work the way it does now.

That would cause difficulties. Now the format of your data depends on whether or not you have a package installed. That's not a very good level of control.

Sorry, I wasn't clear. What I meant was, if hdf5pickle isn't detected you could just refuse to save anything that's not a numpy array.

Ah, good. That makes much more sense. :-)

-- Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

5694

Age (days ago)

5699

Last active (days ago)

List overview

Download

28 comments

9 participants

participants (9)

Albert Thuswaldner
Charles R Harris
Christopher Barker
David Warde-Farley
Eric Firing
Francesc Alted
Pauli Virtanen
Robert Kern
Stephen Simmons

Re: [Numpy-discussion] Home for pyhdf5io?

Francesc Alted

Albert Thuswaldner

David Warde-Farley

Albert Thuswaldner

David Warde-Farley

David Warde-Farley

Francesc Alted

Albert Thuswaldner

Francesc Alted

Francesc Alted

David Warde-Farley

Francesc Alted

David Warde-Farley

Albert Thuswaldner

tags

participants (9)