[Numpy-discussion] numpy pickling problem - python 2 vs. python 3

Ryan Nelson rnelsonchem at gmail.com
Thu Mar 5 15:35:15 EST 2015


This works if run from Py3. Don't know if it will *always* work. From that
GH discussion you linked, it sounds like that is a bit of a hack.
##############
"""Illustrate problem with pytables data - python 2 to python 3."""

from __future__ import print_function

import sys
import numpy as np
import tables as tb
import pickle as pkl


def main():
     """Run the example."""
     print("np.__version__=", np.__version__)
     check_on_same_version = False

     arr1 = np.linspace(0.0, 5.0, 6)
     arr2 = np.linspace(0.0, 10.0, 11)
     data = [arr1, arr2]

     # Only generate on python 2.X or check on the same python version:
     if sys.version < "3.0" or check_on_same_version:
         fpt = tb.open_file("tstdat.h5", mode="w")
         fpt.set_node_attr(fpt.root, "list_of_arrays", data)
         fpt.close()

     # Load the saved file:
     fpt = tb.open_file("tstdat.h5", mode="r")
     result = fpt.get_node_attr("/", "list_of_arrays")
     fpt.close()
     print("Loaded:", pkl.loads(result, encoding="latin1"))

main()
###############
However, I would consider defining some sort of v2 of your HDF file format,
which converts all of the lists of arrays to CArrays or EArrays in the HDF
file. (https://pytables.github.io/usersguide/libref/homogenous_storage.html)
Otherwise, what is the advantage of using HDF files over just plain
shelves?... Just a thought.
Ryan

On Thu, Mar 5, 2015 at 2:52 AM, Anrd Baecker <arnd.baecker at web.de> wrote:

> Dear all,
>
> when preparing the transition of our repositories from python 2
> to python 3, I encountered a problem loading pytables (.h5) files
> generated using python 2.
> I suspect that it is caused by a problem with pickling numpy arrays
> under python 3:
>
> The code appended at the end of this mail works
> fine on either python 2.7 or python 3.4, however,
> generating the data on python 2 and trying to load
> them on python 3 gives some strange string
> ( b'(lp1\ncnumpy.core.multiarray\n_reconstruct\np2\n(cnumpy\nndarray ...)
> instead of
>     [array([ 0.,  1.,  2.,  3.,  4.,  5.]),
>      array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])]
>
> The problem sounds very similar to the one reported here
>    https://github.com/numpy/numpy/issues/4879
> which was fixed with numpy 1.9.
>
> I tried different versions/combintations of numpy (including 1.9.2)
> and always end up with the above result.
> Also I tried to reduce the problem down to the level of pure numpy
> and pickle (as in the above bug report):
>
>    import numpy as np
>    import pickle
>    arr1 = np.linspace(0.0, 1.0, 2)
>    arr2 = np.linspace(0.0, 2.0, 3)
>    data = [arr1, arr2]
>
>    p = pickle.dumps(data)
>    print(pickle.loads(p))
>    p
>
> Using the resulting string for p as input string
> (with b added at the beginnung) under python 3 gives
>    UnicodeDecodeError: 'ascii' codec can't decode
>    byte 0xf0 in position 14: ordinal not in range(128)
>
>
> Can someone reproduce the problem with pytables?
> Is there maybe work-around?
> (And no: I can't re-generate the "old" data files - it's
> hundreds of .h5 files ... ;-).
>
> Many thanks, best, Arnd
>
>
> ##############################################################################
> """Illustrate problem with pytables data - python 2 to python 3."""
>
> from __future__ import print_function
>
> import sys
> import numpy as np
> import tables as tb
>
>
> def main():
>      """Run the example."""
>      print("np.__version__=", np.__version__)
>      check_on_same_version = False
>
>      arr1 = np.linspace(0.0, 5.0, 6)
>      arr2 = np.linspace(0.0, 10.0, 11)
>      data = [arr1, arr2]
>
>      # Only generate on python 2.X or check on the same python version:
>      if sys.version < "3.0" or check_on_same_version:
>          fpt = tb.open_file("tstdat.h5", mode="w")
>          fpt.set_node_attr(fpt.root, "list_of_arrays", data)
>          fpt.close()
>
>      # Load the saved file:
>      fpt = tb.open_file("tstdat.h5", mode="r")
>      result = fpt.get_node_attr("/", "list_of_arrays")
>      fpt.close()
>      print("Loaded:", result)
>
> main()
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150305/8269ce9b/attachment.html>


More information about the NumPy-Discussion mailing list