[Numpy-discussion] numpy pickling problem - python 2 vs. python 3

Arnd Baecker arnd.baecker at web.de
Thu Mar 5 17:52:48 EST 2015


On Thu, 5 Mar 2015, Ryan Nelson wrote:

> This works if run from Py3. Don't know if it will *always* work. From that GH discussion you linked, it sounds
> like that is a bit of a hack.

Great - based on your code I could modify my loader routine so that
on python 3 it can load the files generated on python 2. Many thanks!

Still I would have thought that this should be working out-of-the box,
i.e. without the pickle.loads trick?

[... code ...]

> However, I would consider defining some sort of v2 of your HDF file format, which converts all of the lists of
> arrays to CArrays or EArrays in the HDF file.
> (https://pytables.github.io/usersguide/libref/homogenous_storage.html) Otherwise, what is the advantage of using
> HDF files over just plain shelves?... Just a thought.

Thanks for the suggestion - in our usage scenario
lists of arrays is a border case and only small parts of the data in the 
files have this. The larger arrays are written directly.
So at this point I don't mind if the lists of arrays
are written in the current way (as long as things load fine).

For our applications the main benefit of using HDF files is
the possibility to easily look into them (e.g. using vitables)
- so this means that I don't use all the nice more advance features
of HDF at this point... ;-).

Again many thanks for the prompt reply and solution!

Best, Arnd

> Ryan
> 
> On Thu, Mar 5, 2015 at 2:52 AM, Anrd Baecker <arnd.baecker at web.de> wrote:
>       Dear all,
>
>       when preparing the transition of our repositories from python 2
>       to python 3, I encountered a problem loading pytables (.h5) files
>       generated using python 2.
>       I suspect that it is caused by a problem with pickling numpy arrays
>       under python 3:
>
>       The code appended at the end of this mail works
>       fine on either python 2.7 or python 3.4, however,
>       generating the data on python 2 and trying to load
>       them on python 3 gives some strange string
>       ( b'(lp1\ncnumpy.core.multiarray\n_reconstruct\np2\n(cnumpy\nndarray ...)
>       instead of
>           [array([ 0.,  1.,  2.,  3.,  4.,  5.]),
>            array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])]
>
>       The problem sounds very similar to the one reported here
>          https://github.com/numpy/numpy/issues/4879
>       which was fixed with numpy 1.9.
>
>       I tried different versions/combintations of numpy (including 1.9.2)
>       and always end up with the above result.
>       Also I tried to reduce the problem down to the level of pure numpy
>       and pickle (as in the above bug report):
>
>          import numpy as np
>          import pickle
>          arr1 = np.linspace(0.0, 1.0, 2)
>          arr2 = np.linspace(0.0, 2.0, 3)
>          data = [arr1, arr2]
>
>          p = pickle.dumps(data)
>          print(pickle.loads(p))
>          p
>
>       Using the resulting string for p as input string
>       (with b added at the beginnung) under python 3 gives
>          UnicodeDecodeError: 'ascii' codec can't decode
>          byte 0xf0 in position 14: ordinal not in range(128)
> 
>
>       Can someone reproduce the problem with pytables?
>       Is there maybe work-around?
>       (And no: I can't re-generate the "old" data files - it's
>       hundreds of .h5 files ... ;-).
>
>       Many thanks, best, Arnd
>
>       ##############################################################################
>       """Illustrate problem with pytables data - python 2 to python 3."""
>
>       from __future__ import print_function
>
>       import sys
>       import numpy as np
>       import tables as tb
> 
>
>       def main():
>            """Run the example."""
>            print("np.__version__=", np.__version__)
>            check_on_same_version = False
>
>            arr1 = np.linspace(0.0, 5.0, 6)
>            arr2 = np.linspace(0.0, 10.0, 11)
>            data = [arr1, arr2]
>
>            # Only generate on python 2.X or check on the same python version:
>            if sys.version < "3.0" or check_on_same_version:
>                fpt = tb.open_file("tstdat.h5", mode="w")
>                fpt.set_node_attr(fpt.root, "list_of_arrays", data)
>                fpt.close()
>
>            # Load the saved file:
>            fpt = tb.open_file("tstdat.h5", mode="r")
>            result = fpt.get_node_attr("/", "list_of_arrays")
>            fpt.close()
>            print("Loaded:", result)
>
>       main()
> 
> 
> 
>
>       _______________________________________________
>       NumPy-Discussion mailing list
>       NumPy-Discussion at scipy.org
>       http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> 
>


More information about the NumPy-Discussion mailing list