[patch] read/write v5 .mat files with structs, cell arrays, objects, or function handles

I just submitted a patch (http://scipy.org/scipy/scipy/ticket/743) extending the functionality of io.matlab for v5 files, adding writers for structs, cell arrays, & objects, and readers+writers for function handles and 64-bit ints. The patch modifies the numpy types of strings, structs, and objects loaded from matlab: matlab strings are loaded as numpy arrays of unicode strings, rather than numpy arrays of objects. structs are loaded as numpy arrays with dtype [(field_name, object), ...] matlab objects are wrapped in an mio5.MatlabObject instance, which contains a 1x1 struct & the object classname. (function handles are also wrapped in a special object in mio5.py) These changes were to make the scipy<->matlab mapping more explicit. The changes to the string type probably don't break any existing code. The changes to the struct & matlab objects probably do. It's possible to make both of these backward compatible, if that's a priority. We're looking for feedback on that (or any other) issue. Ray Jones

Hi Ray, Those changes look really useful! Stéfan van der Walt <stefan@sun.ac.za> (on this email list) and Matthew Brett <matthew.brett@gmail.com> (not on the list?) seem to be coordinating the matlab IO (at least, they were the ones who shepherded my recent patch to the same). I'm not 100% sure what the backwards-compatibility guarantees for scipy are right now, but the matlab 5 IO is pretty new, so hopefully useful changes like these won't be a problem. (Any one can say for sure?) One comment -- perhaps for objects, would a 0-d "array scalar" (i.e. shape=()) struct be better than a 1x1 struct? I'm not really sure either way... Anyhow, thanks for the patch. This should be very helpful. Best, Zach On Oct 1, 2008, at 11:52 AM, Thouis (Ray) Jones wrote:
I just submitted a patch (http://scipy.org/scipy/scipy/ticket/743) extending the functionality of io.matlab for v5 files, adding writers for structs, cell arrays, & objects, and readers+writers for function handles and 64-bit ints.
The patch modifies the numpy types of strings, structs, and objects loaded from matlab: matlab strings are loaded as numpy arrays of unicode strings, rather than numpy arrays of objects. structs are loaded as numpy arrays with dtype [(field_name, object), ...] matlab objects are wrapped in an mio5.MatlabObject instance, which contains a 1x1 struct & the object classname. (function handles are also wrapped in a special object in mio5.py)
These changes were to make the scipy<->matlab mapping more explicit. The changes to the string type probably don't break any existing code. The changes to the struct & matlab objects probably do. It's possible to make both of these backward compatible, if that's a priority. We're looking for feedback on that (or any other) issue.
Ray Jones _______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-dev

On Wed, Oct 1, 2008 at 11:52 AM, Zachary Pincus <zachary.pincus@yale.edu> wrote:
I'm not 100% sure what the backwards-compatibility guarantees for scipy are right now, but the matlab 5 IO is pretty new, so hopefully useful changes like these won't be a problem. (Any one can say for sure?)
In general, backward compatibility for SciPy isn't as big a constraint as it is for NumPy. NumPy is an extremely mature codebase that is widely used by many projects, so we have to be *extremely* careful about ABI/API breaks. SciPy on the other hand is still considered 'beta' software, so we should be focusing on getting it to the point that we consider it more 'production' code. Once we get to that point we will need to start being much more careful about ABI/API breaks. Of course, no one is going to want to use SciPy if we arbitrarily break their code with every release of SciPy. So we just need to strike a reasonable balance of fixing and improving SciPy, while not chasing off users. In this particular case, the changes to the mat file io looks very reasonable. I will let Stéfan van der Walt or Matthew Brett make the final call, but I would like to see this patch (or a revised version of it) get in. Thanks, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/

Hi Ray 2008/10/1 Thouis (Ray) Jones <thouis@broad.mit.edu>:
These changes were to make the scipy<->matlab mapping more explicit. The changes to the string type probably don't break any existing code. The changes to the struct & matlab objects probably do. It's possible to make both of these backward compatible, if that's a priority. We're looking for feedback on that (or any other) issue.
Thank you for improving the SciPy IO capabilities! Before we go ahead, would you kindly do the following: - Make sure that the existing test suite passes with your patch applied (I can't do that from here right now) - Write tests to verify the working of the new functionality You'll see that we have a very thorough round-tripping test set for the IO code already, and it would be good to keep this up to date with your changes. Requiring tests for contributions is certainly not intended to slow the adoption of patches; we simply need to see what the code does, and must ensure that it does so correctly. Ideally, we'd also like documentation for all the classes in the IO module, but it would hardly be fair to ask you to write them :) Thanks again, Stéfan

On Wed, Oct 1, 2008 at 4:20 PM, Stéfan van der Walt <stefan@sun.ac.za> wrote:
Thank you for improving the SciPy IO capabilities! Before we go ahead, would you kindly do the following:
- Make sure that the existing test suite passes with your patch applied (I can't do that from here right now) - Write tests to verify the working of the new functionality
Is it acceptable for us to modify the tests to handle the new form of some matlab objects. Structures, objects, and function handles are definitely different. String arrays have a different dtype, as well. Also, I noticed there is a test for reading matlab objects. I think this should be removed, as the structure that matlab uses to save objects is not well documented (besides being a matlab 1x1 struct array with an extra element for the classname). We'll try to get this done soon. Best, Ray Jones

On Thu, Oct 2, 2008 at 5:20 AM, Stéfan van der Walt <stefan@sun.ac.za> wrote:
Hi Ray
2008/10/1 Thouis (Ray) Jones <thouis@broad.mit.edu>:
These changes were to make the scipy<->matlab mapping more explicit. The changes to the string type probably don't break any existing code. The changes to the struct & matlab objects probably do. It's possible to make both of these backward compatible, if that's a priority. We're looking for feedback on that (or any other) issue.
Thank you for improving the SciPy IO capabilities! Before we go ahead, would you kindly do the following:
Would it be possible for this to wait after 0.7 ? More exactly, I would like to keep the current behavior of 0.7, with a deprecation warning, and maybe the new code available through some mechanism to adapt (and would replace in 0.8). Even if scipy is not considered stable, I think we should allow at least one version for deprecation. cheers, David

2008/10/2 David Cournapeau <cournape@gmail.com>:
Would it be possible for this to wait after 0.7 ? More exactly, I would like to keep the current behavior of 0.7, with a deprecation warning, and maybe the new code available through some mechanism to adapt (and would replace in 0.8). Even if scipy is not considered stable, I think we should allow at least one version for deprecation.
We should certainly look at applying the non-API-changing parts, though. I'm not sure what the best way is to represent these structures on the Python side. Thouis, you've thought about this a lot: could you tell us the pros and cons of switching to the new representation? I don't mind the tests changing, as long as the round-trip completes successfully and all the data is still read. Cheers Stéfan

Stéfan van der Walt <stefan@sun.ac.za> writes:
We should certainly look at applying the non-API-changing parts, though. I'm not sure what the best way is to represent these structures on the Python side.
Thouis, you've thought about this a lot: could you tell us the pros and cons of switching to the new representation?
The reason Ray and I changed some of the representations is that we wanted the mapping from Matlab to Python to be symmetric: anything read from a MAT-file should be represented in a way that allows the writer code to write it back in its original form. This requires that the original Matlab type be deducible from the Python representation. * Struct arrays: Matlab struct arrays were previously represented as numpy arrays of dtype=object filled with instances of mat_struct. The problem is that Matlab cell arrays were also represented as numpy arrays of dtype=objects. The writer code could in most cases have identified structs by looking at the contents (instances of mat_struct), but there was no way to distinguish a 0x0 cell array from a 0x0 struct array. We therefore opted to represent struct arrays as numpy record arrays. In order not to break existing code, we could introduce a keyword argument to loadmat that selects the old or new representation, similar to numpy.histogram's "new" argument. In 0.7, leaving the argument out would default to False (old behavior), but give a deprecation warning. Later versions can first change the default to True and then remove the old behavior entirely. The best name I can think of for this keyword argument is "struct_as_record". * Char arrays/strings: Same story. At the lowest level, the code represented char arrays as numpy arrays of dtype='U1', which is fine. A very useful "processor function" (in miobase) turns them into arrays of strings, however. This processor function created an array of dtype=object. We changed this to 'U...' so the array could be distinguished from a cell array. I think this is unlikely to break any code, do you agree? * Objects: This change in representation was purely for our convenience, and we should be able to fix our patch to keep the old representation. Vebjorn

Hi,
The reason Ray and I changed some of the representations is that we wanted the mapping from Matlab to Python to be symmetric: anything read from a MAT-file should be represented in a way that allows the writer code to write it back in its original form. This requires that the original Matlab type be deducible from the Python representation.
Thank you very much for doing this work - it was certainly a major deficiency in the scipy matlab io that it could not do this roundtripping for structs and c. For the API change, the API for the io has already changed in the trunk, from an earlier change (well-motivated) by Nathan Bell. I would love to see a version of this patch in 0.7 if at all possible. Best, Matthew

On Fri, Oct 3, 2008 at 12:34 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
For the API change, the API for the io has already changed in the trunk, from an earlier change (well-motivated) by Nathan Bell. I would love to see a version of this patch in 0.7 if at all possible.
Yes, I know, it broke my packages :) I don't think it is good practice. We allow for scipy API changes, but it would still be better it we were saying that something is about to change. It really does not cost much, I think. cheers, David

Hi,
For the API change, the API for the io has already changed in the trunk, from an earlier change (well-motivated) by Nathan Bell. I would love to see a version of this patch in 0.7 if at all possible.
Yes, I know, it broke my packages :) I don't think it is good practice. We allow for scipy API changes, but it would still be better it we were saying that something is about to change. It really does not cost much, I think.
All your points well taken, and I voted rather weakly against the earlier change, but, given the API is already changed compared to the previous release, and that Scipy releases are infrequent, my weak vote would still be for going the whole way here, but I'm happy to be outvoted. See you, Matthew

I've updated the patch to provide backwards compatibility for matlab structures, controlled by a keyword argument to loadmat (struct_as_record), which defaults to False (old style), and gives a deprecation warning if False. All tests in test_mio.py pass. I updated some of the tests for the new object structure. I tested both options for loading structures with some of my own data, but it would be useful if David could check his code against it (if that's not too difficult), as we don't really have a relevant codebase prior to these changes. Ray Jones (note that I goofed and didn't supersede the old patch in the Trac, but it' obvious which is the right one.)

Hi, On Fri, Oct 3, 2008 at 7:49 AM, Thouis (Ray) Jones <thouis@broad.mit.edu> wrote:
I've updated the patch to provide backwards compatibility for matlab structures, controlled by a keyword argument to loadmat (struct_as_record), which defaults to False (old style), and gives a deprecation warning if False.
I've applied this patch, with thanks, all tests pass for me too. David - any problems? Matthew

On Sun, Oct 5, 2008 at 10:04 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
I've applied this patch, with thanks, all tests pass for me too. David - any problems?
Nope, works for me, David

On Thu, Oct 2, 2008 at 11:14 PM, Vebjorn Ljosa <ljosa@broad.mit.edu> wrote:
The reason Ray and I changed some of the representations is that we wanted the mapping from Matlab to Python to be symmetric: anything read from a MAT-file should be represented in a way that allows the writer code to write it back in its original form. This requires that the original Matlab type be deducible from the Python representation.
FWIW, I agree completely that the current io for matlab would be greatly improved by those changes; there is no arguing that those changes should come in scipy. The problem is that some people have code which depend on the current shortcoming (well, *I* have), and it is better to have one full version at least which says that it will change (with deprecation warning). The code could be in something like io.newmaltlab, which would be changed to io.matlab in 0.8: this way, if people want to use it now, they can. cheers, David
participants (7)
-
David Cournapeau
-
Jarrod Millman
-
Matthew Brett
-
Stéfan van der Walt
-
Thouis (Ray) Jones
-
Vebjorn Ljosa
-
Zachary Pincus