[Numpy-discussion] numpy pickling problem - python 2 vs. python 3

Fri Mar 6 10:37:03 EST 2015

Arnd,

I can see where this is an issue. If you are trying to update your code for
Py3, I still think that it would really help to add a version attribute of
some sort to your new HDF files. You can then write a little check in your
access code that looks for this variable. If it is not present, you know
that it is an old file, and you can use the trick that I gave you.
Otherwise, it will process the file as normal. It could even throw a little
error saying that the file is outdated. You could write a small conversion
script that could run through old files and reprocess them into the new
format. Fortunately, Python is pretty good at automating tasks, even for
hundreds of files :)
It might be informative to ask at the PyTables list to see what they've
done. The Pandas folks also do a lot with HDF files, and they have
certainly worked their way through the Py2-3 transition. Also, because this
is an issue with Python pickle, a quick note on SO might get some hits. I
tried your script using a lists of list, rather than a list of arrays, and
the same problem still persists, so as Pauli notes this is going to be a
problem regardless of the type of attributes you set, I think your just
going to have to hard code some kind of check in your code to switch
behavior. I recently switched to using Py3 exclusively, and although it was
painful at first, I'm quite happy with Py3 overall. I also use the Anaconda
Python distribution, which makes it very easy to have Py2 and Py3
environments if you need to switch back and forth.
Sorry if that doesn't help much. Just some thoughts from my recent
conversion experiences.

Ryan

On Fri, Mar 6, 2015 at 9:48 AM, Arnd Baecker <arnd.baecker at web.de> wrote:

> On Fri, 6 Mar 2015, Pauli Virtanen wrote:
>
> > Arnd Baecker <arnd.baecker <at> web.de> writes:
> > [clip]
> >> Still I would have thought that this should be working out-of-the box,
> >> i.e. without the pickle.loads trick?
> >
> > Pickle files should be considered incompatible between Python 2 and
> Python 3.
> >
> > Python 3 interprets all bytes objects saved by Python 2 as str and
> attempts
> > to decode them under some unicode locale. The default locale is ASCII,
> so it
> > will simply just fail in most cases if the files contain any binary data.
> >
> > Failing by default is also the right thing to do, since the saved bytes
> > objects might actually represent strings in some locale, and ASCII is the
> > safest guess.
> >
> > This behavior is that of Python's pickle module, and does not depend on
> Numpy.
>
> Thank's a lot for the explanation!
>
> So what is then the recommded way to save data under python 2 so that
> they can still be loaded under python 3?
>
> For example using np.save with a list of arrays works fine
> either on python 2 or on python 3.
> However it does not work if one tries to open under python 3
> a file generated before on python 2.
> (Again, because pickle is involved internally
>    "python3.4/site-packages/numpy/lib/npyio.py",
>    line 393, in load  return format.read_array(fid)
>    File "python34/lib/python3.4/site-packages/numpy/lib/format.py",
>    line 602, in read_array  array = pickle.load(fp)
>    UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0  ...
>
> Just to be clear: I don't want to beat a dead horse here - for my usage
> via pytables I was able to solve the loading of old files following
> Ryan's solutions. Personally I don't use .npy files.
> Maybe saving a list containing arrays is an unusual example ...
>
> Still, I am a little bit worried about backwards-compatibility:
> being able to load old data files is an important issue
> as by this it is possible to check whether current code still
> reproduces previously obtained (maybe also published) results.
>
> Best, Arnd
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150306/a15ff9da/attachment.html>