[Numpy-discussion] numpy pickling problem - python 2 vs. python 3

Anrd Baecker arnd.baecker at web.de
Thu Mar 5 02:52:21 EST 2015


Dear all,

when preparing the transition of our repositories from python 2
to python 3, I encountered a problem loading pytables (.h5) files
generated using python 2.
I suspect that it is caused by a problem with pickling numpy arrays
under python 3:

The code appended at the end of this mail works
fine on either python 2.7 or python 3.4, however,
generating the data on python 2 and trying to load
them on python 3 gives some strange string
( b'(lp1\ncnumpy.core.multiarray\n_reconstruct\np2\n(cnumpy\nndarray ...)
instead of
    [array([ 0.,  1.,  2.,  3.,  4.,  5.]),
     array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])]

The problem sounds very similar to the one reported here
   https://github.com/numpy/numpy/issues/4879
which was fixed with numpy 1.9.

I tried different versions/combintations of numpy (including 1.9.2)
and always end up with the above result.
Also I tried to reduce the problem down to the level of pure numpy
and pickle (as in the above bug report):

   import numpy as np
   import pickle
   arr1 = np.linspace(0.0, 1.0, 2)
   arr2 = np.linspace(0.0, 2.0, 3)
   data = [arr1, arr2]

   p = pickle.dumps(data)
   print(pickle.loads(p))
   p

Using the resulting string for p as input string
(with b added at the beginnung) under python 3 gives
   UnicodeDecodeError: 'ascii' codec can't decode
   byte 0xf0 in position 14: ordinal not in range(128)


Can someone reproduce the problem with pytables?
Is there maybe work-around?
(And no: I can't re-generate the "old" data files - it's
hundreds of .h5 files ... ;-).

Many thanks, best, Arnd

##############################################################################
"""Illustrate problem with pytables data - python 2 to python 3."""

from __future__ import print_function

import sys
import numpy as np
import tables as tb


def main():
     """Run the example."""
     print("np.__version__=", np.__version__)
     check_on_same_version = False

     arr1 = np.linspace(0.0, 5.0, 6)
     arr2 = np.linspace(0.0, 10.0, 11)
     data = [arr1, arr2]

     # Only generate on python 2.X or check on the same python version:
     if sys.version < "3.0" or check_on_same_version:
         fpt = tb.open_file("tstdat.h5", mode="w")
         fpt.set_node_attr(fpt.root, "list_of_arrays", data)
         fpt.close()

     # Load the saved file:
     fpt = tb.open_file("tstdat.h5", mode="r")
     result = fpt.get_node_attr("/", "list_of_arrays")
     fpt.close()
     print("Loaded:", result)

main()







More information about the NumPy-Discussion mailing list