We have an application that has previously used masked arrays from Numpy 1.0.3. Part of saving files from that application involved pickling data types that contained these masked arrays. In the latest round of library updates, we've decided to move to the most recent version of matplotlib, which requires Numpy 1.1. Unfortunately, when we try to unpickle the data saved with Numpy 1.0.3 in the new code using Numpy 1.1.0, it chokes because it can't import numpy.core.ma for the masked arrays. A check of Numpy 1.1.0 shows that this is now numpy.ma.core. Does anyone have any advice on how we can unpickle the old data files and update the references to the new classes? Thanks, Anthony. -- Anthony Floyd, PhD Convergent Manufacturing Technologies Inc. 6190 Agronomy Rd, Suite 403 Vancouver BC V6T 1Z3 CANADA Email: Anthony.Floyd@convergent.ca | Tel: 604-822-9682 x102 WWW: http://www.convergent.ca | Fax: 604-822-9659 CMT is hiring: See http://www.convergent.ca for details
Hi Anthony 2008/7/16 Anthony Floyd <anthony.floyd@convergent.ca>:
Unfortunately, when we try to unpickle the data saved with Numpy 1.0.3 in the new code using Numpy 1.1.0, it chokes because it can't import numpy.core.ma for the masked arrays. A check of Numpy 1.1.0 shows that this is now numpy.ma.core.
The maskedarray functionality has been rewritten, and is now `numpy.ma`. For the time being, the old package is still available as `numpy.oldnumeric.ma`. Regards Stéfan
Hi Stéfan,
2008/7/16 Anthony Floyd <anthony.floyd@convergent.ca>:
Unfortunately, when we try to unpickle the data saved with Numpy 1.0.3 in the new code using Numpy 1.1.0, it chokes because it can't import numpy.core.ma for the masked arrays. A check of Numpy 1.1.0 shows that this is now numpy.ma.core.
The maskedarray functionality has been rewritten, and is now `numpy.ma`. For the time being, the old package is still available as `numpy.oldnumeric.ma`.
Yes, we're aware it's changed. The problem is that when pickle unpickles the data, it tries to assign the data back to its original class ... and the class type for masked arrays under 1.0.3 is numpy.core.ma.MaskedArray. This class type has changed in 1.1.0 to numpy.ma.core.MaskedArray. Since pickle can't find the old type, it fails to load the data. What I need to know is how I can trick pickle or Numpy to put the old class into the new class. The only thing we've come up with is to create our own numpy.core.ma.MaskedArray in 1.1.0 as a class that inherits numpy.ma.core.MaskedArray and doesn't make any changes to it. It's extremely surprising to find a significant API change like this in a stable package. Thanks, Anthony. -- Anthony Floyd, PhD Convergent Manufacturing Technologies Inc. 6190 Agronomy Rd, Suite 403 Vancouver BC V6T 1Z3 CANADA Email: Anthony.Floyd@convergent.ca | Tel: 604-822-9682 x102 WWW: http://www.convergent.ca | Fax: 604-822-9659 CMT is hiring: See http://www.convergent.ca for details
2008/7/17 Anthony Floyd <anthony.floyd@convergent.ca>:
What I need to know is how I can trick pickle or Numpy to put the old class into the new class.
If you have an example data-file, send it to me off-list and I'll figure out what to do. Maybe it is as simple as np.core.ma = np.oldnumeric.ma
It's extremely surprising to find a significant API change like this in a stable package.
I don't know if renaming things in np.core counts as an API change. Pickling is notoriously unreliable for storing arrays, which is why Robert wrote `load` and `save`. I hope that Pierre can get around to implementing MaskedArray storage for 1.2. Otherwise, you can already save the array and mask separately. Regards Stéfan
On Thursday 17 July 2008 12:54:10 Stéfan van der Walt wrote:
I don't know if renaming things in np.core counts as an API change. Pickling is notoriously unreliable for storing arrays, which is why Robert wrote `load` and `save`. I hope that Pierre can get around to implementing MaskedArray storage for 1.2.
Wow, I'll see what I can do, but no promises.
Otherwise, you can already save the array and mask separately.
An other possibility is to store the MaskedArray as a record array, with one field for the data and one field for the mask.
Hi Pierre, 2008/7/17 Pierre GM <pgmdevlist@gmail.com>:
Otherwise, you can already save the array and mask separately.
An other possibility is to store the MaskedArray as a record array, with one field for the data and one field for the mask.
What about the other parameters, such as fill value? Do we know its type beforehand? If we can come up with a robust way to convert a MaskedArray into (one or more) structured array(s), that would be perfect for storage purposes. Also, you wouldn't need to be volunteered to implement it :) Further, could we rename numpy.ma.core to numpy.ma._core? I think we should make it clear that users should not import from core directly. Cheers Stéfan
Further, could we rename numpy.ma.core to numpy.ma._core? I think we should make it clear that users should not import from core directly.
Just to add a bit of noise here, it's not that we were importing directly from .core, it's that pickle was telling us that the actual class associated with the masked array was numpy.ma.core.MaskedArray (erm, well, numpy.core.ma.MaskedArray in the older version). Changing the location *again* will break it again, in the exact same way. A>
On Thursday 17 July 2008 16:29:48 Stéfan van der Walt wrote:
An other possibility is to store the MaskedArray as a record array, with one field for the data and one field for the mask.
What about the other parameters, such as fill value?
Dang, forgot about that. Having a dictionary of options would be cool, but we can't store it inside a regular ndarray. If we write to a file, we may want to write a header first that would store all the metadata we need.
If we can come up with a robust way to convert a MaskedArray into (one or more) structured array(s), that would be perfect for storage purposes. Also, you wouldn't need to be volunteered to implement it :)
A few weeks ago, I played a bit with interfacing TimeSeries and pytables: the idea is to transform the series (basically a MaskedArray) into a record array, and add the parameters such as fill_value in the metadata section of the table. Works great, we may want to follow the same pattern. Moreover, hdf5 is portable.
Further, could we rename numpy.ma.core to numpy.ma._core? I think we should make it clear that users should not import from core directly.
Anthony raised a very good point against that, and I agree. There's no need for that. Anthony, just making a symlink from numpy/oldnumeric/ma.py to numpy/core/ma.py works to unpickle your array. I agree it's still impractical...
On Thu, Jul 17, 2008 at 3:18 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
Dang, forgot about that. Having a dictionary of options would be cool, but we can't store it inside a regular ndarray. If we write to a file, we may want to write a header first that would store all the metadata we need.
Not to derail the discussion, but I am a frequent user of Python's shelve function to archive large numpy arrays and associated sets of parameters into one very handy and accessible file. If numpy developers are discouraging use of this type of thing (shelve relies on pickle, is this correct?), then it would be super handy to be able to also include other data when saving arrays using numpy's intrinsic functions. Just a thought.
What I need to know is how I can trick pickle or Numpy to put the old class into the new class.
If you have an example data-file, send it to me off-list and I'll figure out what to do. Maybe it is as simple as
np.core.ma = np.oldnumeric.ma
Yes, pretty much. We've put ma.py into numpy.core where ma.py is nothing more than: import numpy.oldnumeric.ma as ma class MaskedArray(ma.MaskedArray): pass It works, but becomes a bit of a headache because we now have to maintain our own numpy package so that all the developers get these three lines when they install numpy. Anyway, it lets us unpickle/unshelve the old data files with 1.1.0. The next step is to transition on the fly the old numpy.core.ma.MaskedArray classes to numpy.ma.core.MaskedArray classes so that when oldnumeric gets depreciated we're not stuck. Thanks for the input, Anthony.
On Thursday 17 July 2008 19:41:51 Anthony Floyd wrote:
What I need to know is how I can trick pickle or Numpy to
put the old class into the new class.
If you have an example data-file, send it to me off-list and I'll figure out what to do. Maybe it is as simple as
np.core.ma = np.oldnumeric.ma
Yes, pretty much. We've put ma.py into numpy.core where ma.py is nothing more than:
import numpy.oldnumeric.ma as ma
class MaskedArray(ma.MaskedArray): pass
It works, but becomes a bit of a headache because we now have to maintain our own numpy package so that all the developers get these three lines when they install numpy.
Did you try changing the pickled data? IIRC you could simply search&replace, since the class name is at the beginning of the representation in clear text. (I see that this a hack, too but it saves you from having to maintain your own numpy.) HTH, Hans
participants (5)
-
Anthony Floyd
-
Hans Meine
-
Mark Miller
-
Pierre GM
-
Stéfan van der Walt