[Numpy-discussion] numpy pickling problem - python 2 vs. python 3

Sebastian sebix at sebix.at
Fri Mar 6 12:34:27 EST 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi all,

As this also affects .npy files, which uses pickle internally, why can't
this be done by Numpy itself? This breaks backwards compatibility in a
very bad way in my opinion.

The company I worked for uses Numpy and consorts a lot and also has many
data in .npy and pickle files. They currently work with 2.7, but I also
tried to develop my programs to be compatible with Py 3. But this was
not possible when it came to the point of dumping and loading npy files.
I think this will be major reason why people won't take the step forward
to Py3 and Numpy is not considered to be compatible to Python 3.

just my 5 cents,
Sebastian

On 03/06/2015 04:37 PM, Ryan Nelson wrote:
> Arnd,
>
> I can see where this is an issue. If you are trying to update your
code for Py3, I still think that it would really help to add a version
attribute of some sort to your new HDF files. You can then write a
little check in your access code that looks for this variable. If it is
not present, you know that it is an old file, and you can use the trick
that I gave you. Otherwise, it will process the file as normal. It could
even throw a little error saying that the file is outdated. You could
write a small conversion script that could run through old files and
reprocess them into the new format. Fortunately, Python is pretty good
at automating tasks, even for hundreds of files :)
> It might be informative to ask at the PyTables list to see what
they've done. The Pandas folks also do a lot with HDF files, and they
have certainly worked their way through the Py2-3 transition. Also,
because this is an issue with Python pickle, a quick note on SO might
get some hits. I tried your script using a lists of list, rather than a
list of arrays, and the same problem still persists, so as Pauli notes
this is going to be a problem regardless of the type of attributes you
set, I think your just going to have to hard code some kind of check in
your code to switch behavior. I recently switched to using Py3
exclusively, and although it was painful at first, I'm quite happy with
Py3 overall. I also use the Anaconda Python distribution, which makes it
very easy to have Py2 and Py3 environments if you need to switch back
and forth.
> Sorry if that doesn't help much. Just some thoughts from my recent
conversion experiences.
>
> Ryan
>
>
>
> On Fri, Mar 6, 2015 at 9:48 AM, Arnd Baecker <arnd.baecker at web.de
<mailto:arnd.baecker at web.de>> wrote:
>
>     On Fri, 6 Mar 2015, Pauli Virtanen wrote:
>
>     > Arnd Baecker <arnd.baecker <at> web.de <http://web.de>> writes:
>     > [clip]
>     >> Still I would have thought that this should be working
out-of-the box,
>     >> i.e. without the pickle.loads trick?
>     >
>     > Pickle files should be considered incompatible between Python 2
and Python 3.
>     >
>     > Python 3 interprets all bytes objects saved by Python 2 as str
and attempts
>     > to decode them under some unicode locale. The default locale is
ASCII, so it
>     > will simply just fail in most cases if the files contain any
binary data.
>     >
>     > Failing by default is also the right thing to do, since the
saved bytes
>     > objects might actually represent strings in some locale, and
ASCII is the
>     > safest guess.
>     >
>     > This behavior is that of Python's pickle module, and does not
depend on Numpy.
>
>     Thank's a lot for the explanation!
>
>     So what is then the recommded way to save data under python 2 so that
>     they can still be loaded under python 3?
>
>     For example using np.save with a list of arrays works fine
>     either on python 2 or on python 3.
>     However it does not work if one tries to open under python 3
>     a file generated before on python 2.
>     (Again, because pickle is involved internally
>        "python3.4/site-packages/numpy/lib/npyio.py",
>        line 393, in load  return format.read_array(fid)
>        File "python34/lib/python3.4/site-packages/numpy/lib/format.py",
>        line 602, in read_array  array = pickle.load(fp)
>        UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0  ...
>
>     Just to be clear: I don't want to beat a dead horse here - for my
usage
>     via pytables I was able to solve the loading of old files following
>     Ryan's solutions. Personally I don't use .npy files.
>     Maybe saving a list containing arrays is an unusual example ...
>
>     Still, I am a little bit worried about backwards-compatibility:
>     being able to load old data files is an important issue
>     as by this it is possible to check whether current code still
>     reproduces previously obtained (maybe also published) results.
>
>     Best, Arnd
>
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>     http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> --
> python programming - mail server - photo - video - https://sebix.at
> To verify my cryptographic signature or send me encrypted mails, get my
> key at https://sebix.at/DC9B463B.asc and on public keyservers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBCAAGBQJU+eUjAAoJEBn0X+vcm0Y7/WcQAK1iH3VHffrgEAFq7FU+aDw1
qAkKDcBi82aByr5v3S9zRRpcvYexk0tcNhQCoHUAGZHBCia86Ix1NLx8JT79SjFs
wJMxYN8X8r8UcZEuhzw1tMJsflo7UY79CkkzIWPBbdtu5xiVCYkq3O8c3FU3NpZK
9xJPZ5W8+i9pkRDh6i36MuMtncfkbVMTkbo0Dp8DMkkRbQdvK8dfL3NJKZ8dRaIz
zYOBBtgVMNcRFvwUnyE+lPYVp2bsDazIoa+6JIvlkWz86Rj6knC5Ehs6L710Bk1G
LN0/taZhvRlImLrF8QLgZIhYCpXV45quc8dhkQDP6TOM+9j1LadvfstHPHlCfLBF
N4VI7aWKXfAcShb8puaJdLz+F78+esJ7S0tWzRk6ZeJkoY1fBr3kvi3kvyUyy9g/
wV+MQnV1ioptmW+twnmo33AY4IA0qxjwB0uM0PcjjWZY7PrunnDtJRKDll+ruWEm
UByUGtu881AbCMVnbTqpoJ+Ri12U0VR8gDn8zHVIUO6Q11v5cMuSOJTV0rls+n2E
+7UZCL70UUUYBc//fclUvJ2MOxtfbRFqu3hvghCI5weJmAIn8r7O2D1/2mQvgjgn
TqALF/zzJxoHS0EgjjbEsIMFkS1s8NiRJmPD3hWfOteyOogn3GHRYkaYov4YQGD3
YYfdjIWviS0meKMdQD59
=fI60
-----END PGP SIGNATURE-----





More information about the NumPy-Discussion mailing list