numpy pickling problem - python 2 vs. python 3

Dear all, when preparing the transition of our repositories from python 2 to python 3, I encountered a problem loading pytables (.h5) files generated using python 2. I suspect that it is caused by a problem with pickling numpy arrays under python 3: The code appended at the end of this mail works fine on either python 2.7 or python 3.4, however, generating the data on python 2 and trying to load them on python 3 gives some strange string ( b'(lp1\ncnumpy.core.multiarray\n_reconstruct\np2\n(cnumpy\nndarray ...) instead of [array([ 0., 1., 2., 3., 4., 5.]), array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])] The problem sounds very similar to the one reported here https://github.com/numpy/numpy/issues/4879 which was fixed with numpy 1.9. I tried different versions/combintations of numpy (including 1.9.2) and always end up with the above result. Also I tried to reduce the problem down to the level of pure numpy and pickle (as in the above bug report): import numpy as np import pickle arr1 = np.linspace(0.0, 1.0, 2) arr2 = np.linspace(0.0, 2.0, 3) data = [arr1, arr2] p = pickle.dumps(data) print(pickle.loads(p)) p Using the resulting string for p as input string (with b added at the beginnung) under python 3 gives UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 14: ordinal not in range(128) Can someone reproduce the problem with pytables? Is there maybe work-around? (And no: I can't re-generate the "old" data files - it's hundreds of .h5 files ... ;-). Many thanks, best, Arnd ############################################################################## """Illustrate problem with pytables data - python 2 to python 3.""" from __future__ import print_function import sys import numpy as np import tables as tb def main(): """Run the example.""" print("np.__version__=", np.__version__) check_on_same_version = False arr1 = np.linspace(0.0, 5.0, 6) arr2 = np.linspace(0.0, 10.0, 11) data = [arr1, arr2] # Only generate on python 2.X or check on the same python version: if sys.version < "3.0" or check_on_same_version: fpt = tb.open_file("tstdat.h5", mode="w") fpt.set_node_attr(fpt.root, "list_of_arrays", data) fpt.close() # Load the saved file: fpt = tb.open_file("tstdat.h5", mode="r") result = fpt.get_node_attr("/", "list_of_arrays") fpt.close() print("Loaded:", result) main()

This works if run from Py3. Don't know if it will *always* work. From that GH discussion you linked, it sounds like that is a bit of a hack. ############## """Illustrate problem with pytables data - python 2 to python 3.""" from __future__ import print_function import sys import numpy as np import tables as tb import pickle as pkl def main(): """Run the example.""" print("np.__version__=", np.__version__) check_on_same_version = False arr1 = np.linspace(0.0, 5.0, 6) arr2 = np.linspace(0.0, 10.0, 11) data = [arr1, arr2] # Only generate on python 2.X or check on the same python version: if sys.version < "3.0" or check_on_same_version: fpt = tb.open_file("tstdat.h5", mode="w") fpt.set_node_attr(fpt.root, "list_of_arrays", data) fpt.close() # Load the saved file: fpt = tb.open_file("tstdat.h5", mode="r") result = fpt.get_node_attr("/", "list_of_arrays") fpt.close() print("Loaded:", pkl.loads(result, encoding="latin1")) main() ############### However, I would consider defining some sort of v2 of your HDF file format, which converts all of the lists of arrays to CArrays or EArrays in the HDF file. (https://pytables.github.io/usersguide/libref/homogenous_storage.html) Otherwise, what is the advantage of using HDF files over just plain shelves?... Just a thought. Ryan On Thu, Mar 5, 2015 at 2:52 AM, Anrd Baecker <arnd.baecker@web.de> wrote:
Dear all,
when preparing the transition of our repositories from python 2 to python 3, I encountered a problem loading pytables (.h5) files generated using python 2. I suspect that it is caused by a problem with pickling numpy arrays under python 3:
The code appended at the end of this mail works fine on either python 2.7 or python 3.4, however, generating the data on python 2 and trying to load them on python 3 gives some strange string ( b'(lp1\ncnumpy.core.multiarray\n_reconstruct\np2\n(cnumpy\nndarray ...) instead of [array([ 0., 1., 2., 3., 4., 5.]), array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])]
The problem sounds very similar to the one reported here https://github.com/numpy/numpy/issues/4879 which was fixed with numpy 1.9.
I tried different versions/combintations of numpy (including 1.9.2) and always end up with the above result. Also I tried to reduce the problem down to the level of pure numpy and pickle (as in the above bug report):
import numpy as np import pickle arr1 = np.linspace(0.0, 1.0, 2) arr2 = np.linspace(0.0, 2.0, 3) data = [arr1, arr2]
p = pickle.dumps(data) print(pickle.loads(p)) p
Using the resulting string for p as input string (with b added at the beginnung) under python 3 gives UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 14: ordinal not in range(128)
Can someone reproduce the problem with pytables? Is there maybe work-around? (And no: I can't re-generate the "old" data files - it's hundreds of .h5 files ... ;-).
Many thanks, best, Arnd
############################################################################## """Illustrate problem with pytables data - python 2 to python 3."""
from __future__ import print_function
import sys import numpy as np import tables as tb
def main(): """Run the example.""" print("np.__version__=", np.__version__) check_on_same_version = False
arr1 = np.linspace(0.0, 5.0, 6) arr2 = np.linspace(0.0, 10.0, 11) data = [arr1, arr2]
# Only generate on python 2.X or check on the same python version: if sys.version < "3.0" or check_on_same_version: fpt = tb.open_file("tstdat.h5", mode="w") fpt.set_node_attr(fpt.root, "list_of_arrays", data) fpt.close()
# Load the saved file: fpt = tb.open_file("tstdat.h5", mode="r") result = fpt.get_node_attr("/", "list_of_arrays") fpt.close() print("Loaded:", result)
main()
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Thu, 5 Mar 2015, Ryan Nelson wrote:
This works if run from Py3. Don't know if it will *always* work. From that GH discussion you linked, it sounds like that is a bit of a hack.
Great - based on your code I could modify my loader routine so that on python 3 it can load the files generated on python 2. Many thanks! Still I would have thought that this should be working out-of-the box, i.e. without the pickle.loads trick? [... code ...]
However, I would consider defining some sort of v2 of your HDF file format, which converts all of the lists of arrays to CArrays or EArrays in the HDF file. (https://pytables.github.io/usersguide/libref/homogenous_storage.html) Otherwise, what is the advantage of using HDF files over just plain shelves?... Just a thought.
Thanks for the suggestion - in our usage scenario lists of arrays is a border case and only small parts of the data in the files have this. The larger arrays are written directly. So at this point I don't mind if the lists of arrays are written in the current way (as long as things load fine). For our applications the main benefit of using HDF files is the possibility to easily look into them (e.g. using vitables) - so this means that I don't use all the nice more advance features of HDF at this point... ;-). Again many thanks for the prompt reply and solution! Best, Arnd
Ryan
On Thu, Mar 5, 2015 at 2:52 AM, Anrd Baecker <arnd.baecker@web.de> wrote: Dear all,
when preparing the transition of our repositories from python 2 to python 3, I encountered a problem loading pytables (.h5) files generated using python 2. I suspect that it is caused by a problem with pickling numpy arrays under python 3:
The code appended at the end of this mail works fine on either python 2.7 or python 3.4, however, generating the data on python 2 and trying to load them on python 3 gives some strange string ( b'(lp1\ncnumpy.core.multiarray\n_reconstruct\np2\n(cnumpy\nndarray ...) instead of [array([ 0., 1., 2., 3., 4., 5.]), array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])]
The problem sounds very similar to the one reported here https://github.com/numpy/numpy/issues/4879 which was fixed with numpy 1.9.
I tried different versions/combintations of numpy (including 1.9.2) and always end up with the above result. Also I tried to reduce the problem down to the level of pure numpy and pickle (as in the above bug report):
import numpy as np import pickle arr1 = np.linspace(0.0, 1.0, 2) arr2 = np.linspace(0.0, 2.0, 3) data = [arr1, arr2]
p = pickle.dumps(data) print(pickle.loads(p)) p
Using the resulting string for p as input string (with b added at the beginnung) under python 3 gives UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 14: ordinal not in range(128)
Can someone reproduce the problem with pytables? Is there maybe work-around? (And no: I can't re-generate the "old" data files - it's hundreds of .h5 files ... ;-).
Many thanks, best, Arnd
############################################################################## """Illustrate problem with pytables data - python 2 to python 3."""
from __future__ import print_function
import sys import numpy as np import tables as tb
def main(): """Run the example.""" print("np.__version__=", np.__version__) check_on_same_version = False
arr1 = np.linspace(0.0, 5.0, 6) arr2 = np.linspace(0.0, 10.0, 11) data = [arr1, arr2]
# Only generate on python 2.X or check on the same python version: if sys.version < "3.0" or check_on_same_version: fpt = tb.open_file("tstdat.h5", mode="w") fpt.set_node_attr(fpt.root, "list_of_arrays", data) fpt.close()
# Load the saved file: fpt = tb.open_file("tstdat.h5", mode="r") result = fpt.get_node_attr("/", "list_of_arrays") fpt.close() print("Loaded:", result)
main()
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Arnd Baecker <arnd.baecker <at> web.de> writes: [clip]
Still I would have thought that this should be working out-of-the box, i.e. without the pickle.loads trick?
Pickle files should be considered incompatible between Python 2 and Python 3. Python 3 interprets all bytes objects saved by Python 2 as str and attempts to decode them under some unicode locale. The default locale is ASCII, so it will simply just fail in most cases if the files contain any binary data. Failing by default is also the right thing to do, since the saved bytes objects might actually represent strings in some locale, and ASCII is the safest guess. This behavior is that of Python's pickle module, and does not depend on Numpy.

On Fri, 6 Mar 2015, Pauli Virtanen wrote:
Arnd Baecker <arnd.baecker <at> web.de> writes: [clip]
Still I would have thought that this should be working out-of-the box, i.e. without the pickle.loads trick?
Pickle files should be considered incompatible between Python 2 and Python 3.
Python 3 interprets all bytes objects saved by Python 2 as str and attempts to decode them under some unicode locale. The default locale is ASCII, so it will simply just fail in most cases if the files contain any binary data.
Failing by default is also the right thing to do, since the saved bytes objects might actually represent strings in some locale, and ASCII is the safest guess.
This behavior is that of Python's pickle module, and does not depend on Numpy.
Thank's a lot for the explanation! So what is then the recommded way to save data under python 2 so that they can still be loaded under python 3? For example using np.save with a list of arrays works fine either on python 2 or on python 3. However it does not work if one tries to open under python 3 a file generated before on python 2. (Again, because pickle is involved internally "python3.4/site-packages/numpy/lib/npyio.py", line 393, in load return format.read_array(fid) File "python34/lib/python3.4/site-packages/numpy/lib/format.py", line 602, in read_array array = pickle.load(fp) UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 ... Just to be clear: I don't want to beat a dead horse here - for my usage via pytables I was able to solve the loading of old files following Ryan's solutions. Personally I don't use .npy files. Maybe saving a list containing arrays is an unusual example ... Still, I am a little bit worried about backwards-compatibility: being able to load old data files is an important issue as by this it is possible to check whether current code still reproduces previously obtained (maybe also published) results. Best, Arnd

Arnd, I can see where this is an issue. If you are trying to update your code for Py3, I still think that it would really help to add a version attribute of some sort to your new HDF files. You can then write a little check in your access code that looks for this variable. If it is not present, you know that it is an old file, and you can use the trick that I gave you. Otherwise, it will process the file as normal. It could even throw a little error saying that the file is outdated. You could write a small conversion script that could run through old files and reprocess them into the new format. Fortunately, Python is pretty good at automating tasks, even for hundreds of files :) It might be informative to ask at the PyTables list to see what they've done. The Pandas folks also do a lot with HDF files, and they have certainly worked their way through the Py2-3 transition. Also, because this is an issue with Python pickle, a quick note on SO might get some hits. I tried your script using a lists of list, rather than a list of arrays, and the same problem still persists, so as Pauli notes this is going to be a problem regardless of the type of attributes you set, I think your just going to have to hard code some kind of check in your code to switch behavior. I recently switched to using Py3 exclusively, and although it was painful at first, I'm quite happy with Py3 overall. I also use the Anaconda Python distribution, which makes it very easy to have Py2 and Py3 environments if you need to switch back and forth. Sorry if that doesn't help much. Just some thoughts from my recent conversion experiences. Ryan On Fri, Mar 6, 2015 at 9:48 AM, Arnd Baecker <arnd.baecker@web.de> wrote:
On Fri, 6 Mar 2015, Pauli Virtanen wrote:
Arnd Baecker <arnd.baecker <at> web.de> writes: [clip]
Still I would have thought that this should be working out-of-the box, i.e. without the pickle.loads trick?
Pickle files should be considered incompatible between Python 2 and Python 3.
Python 3 interprets all bytes objects saved by Python 2 as str and attempts to decode them under some unicode locale. The default locale is ASCII, so it will simply just fail in most cases if the files contain any binary data.
Failing by default is also the right thing to do, since the saved bytes objects might actually represent strings in some locale, and ASCII is the safest guess.
This behavior is that of Python's pickle module, and does not depend on Numpy.
Thank's a lot for the explanation!
So what is then the recommded way to save data under python 2 so that they can still be loaded under python 3?
For example using np.save with a list of arrays works fine either on python 2 or on python 3. However it does not work if one tries to open under python 3 a file generated before on python 2. (Again, because pickle is involved internally "python3.4/site-packages/numpy/lib/npyio.py", line 393, in load return format.read_array(fid) File "python34/lib/python3.4/site-packages/numpy/lib/format.py", line 602, in read_array array = pickle.load(fp) UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 ...
Just to be clear: I don't want to beat a dead horse here - for my usage via pytables I was able to solve the loading of old files following Ryan's solutions. Personally I don't use .npy files. Maybe saving a list containing arrays is an unusual example ...
Still, I am a little bit worried about backwards-compatibility: being able to load old data files is an important issue as by this it is possible to check whether current code still reproduces previously obtained (maybe also published) results.
Best, Arnd
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Arnd,
I can see where this is an issue. If you are trying to update your code for Py3, I still think that it would really help to add a version attribute of some sort to your new HDF files. You can then write a
It might be informative to ask at the PyTables list to see what
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi all, As this also affects .npy files, which uses pickle internally, why can't this be done by Numpy itself? This breaks backwards compatibility in a very bad way in my opinion. The company I worked for uses Numpy and consorts a lot and also has many data in .npy and pickle files. They currently work with 2.7, but I also tried to develop my programs to be compatible with Py 3. But this was not possible when it came to the point of dumping and loading npy files. I think this will be major reason why people won't take the step forward to Py3 and Numpy is not considered to be compatible to Python 3. just my 5 cents, Sebastian On 03/06/2015 04:37 PM, Ryan Nelson wrote: little check in your access code that looks for this variable. If it is not present, you know that it is an old file, and you can use the trick that I gave you. Otherwise, it will process the file as normal. It could even throw a little error saying that the file is outdated. You could write a small conversion script that could run through old files and reprocess them into the new format. Fortunately, Python is pretty good at automating tasks, even for hundreds of files :) they've done. The Pandas folks also do a lot with HDF files, and they have certainly worked their way through the Py2-3 transition. Also, because this is an issue with Python pickle, a quick note on SO might get some hits. I tried your script using a lists of list, rather than a list of arrays, and the same problem still persists, so as Pauli notes this is going to be a problem regardless of the type of attributes you set, I think your just going to have to hard code some kind of check in your code to switch behavior. I recently switched to using Py3 exclusively, and although it was painful at first, I'm quite happy with Py3 overall. I also use the Anaconda Python distribution, which makes it very easy to have Py2 and Py3 environments if you need to switch back and forth.
Sorry if that doesn't help much. Just some thoughts from my recent conversion experiences.
Ryan
On Fri, Mar 6, 2015 at 9:48 AM, Arnd Baecker <arnd.baecker@web.de <mailto:arnd.baecker@web.de>> wrote:
On Fri, 6 Mar 2015, Pauli Virtanen wrote:
> Arnd Baecker <arnd.baecker <at> web.de <http://web.de>> writes: > [clip] >> Still I would have thought that this should be working out-of-the box, >> i.e. without the pickle.loads trick? > > Pickle files should be considered incompatible between Python 2 and Python 3. > > Python 3 interprets all bytes objects saved by Python 2 as str and attempts > to decode them under some unicode locale. The default locale is ASCII, so it > will simply just fail in most cases if the files contain any binary data. > > Failing by default is also the right thing to do, since the saved bytes > objects might actually represent strings in some locale, and ASCII is the > safest guess. > > This behavior is that of Python's pickle module, and does not depend on Numpy.
Thank's a lot for the explanation!
So what is then the recommded way to save data under python 2 so that they can still be loaded under python 3?
For example using np.save with a list of arrays works fine either on python 2 or on python 3. However it does not work if one tries to open under python 3 a file generated before on python 2. (Again, because pickle is involved internally "python3.4/site-packages/numpy/lib/npyio.py", line 393, in load return format.read_array(fid) File "python34/lib/python3.4/site-packages/numpy/lib/format.py", line 602, in read_array array = pickle.load(fp) UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 ...
Just to be clear: I don't want to beat a dead horse here - for my usage via pytables I was able to solve the loading of old files following Ryan's solutions. Personally I don't use .npy files. Maybe saving a list containing arrays is an unusual example ...
Still, I am a little bit worried about backwards-compatibility: being able to load old data files is an important issue as by this it is possible to check whether current code still reproduces previously obtained (maybe also published) results.
Best, Arnd
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- python programming - mail server - photo - video - https://sebix.at To verify my cryptographic signature or send me encrypted mails, get my key at https://sebix.at/DC9B463B.asc and on public keyservers. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1
iQIcBAEBCAAGBQJU+eUjAAoJEBn0X+vcm0Y7/WcQAK1iH3VHffrgEAFq7FU+aDw1 qAkKDcBi82aByr5v3S9zRRpcvYexk0tcNhQCoHUAGZHBCia86Ix1NLx8JT79SjFs wJMxYN8X8r8UcZEuhzw1tMJsflo7UY79CkkzIWPBbdtu5xiVCYkq3O8c3FU3NpZK 9xJPZ5W8+i9pkRDh6i36MuMtncfkbVMTkbo0Dp8DMkkRbQdvK8dfL3NJKZ8dRaIz zYOBBtgVMNcRFvwUnyE+lPYVp2bsDazIoa+6JIvlkWz86Rj6knC5Ehs6L710Bk1G LN0/taZhvRlImLrF8QLgZIhYCpXV45quc8dhkQDP6TOM+9j1LadvfstHPHlCfLBF N4VI7aWKXfAcShb8puaJdLz+F78+esJ7S0tWzRk6ZeJkoY1fBr3kvi3kvyUyy9g/ wV+MQnV1ioptmW+twnmo33AY4IA0qxjwB0uM0PcjjWZY7PrunnDtJRKDll+ruWEm UByUGtu881AbCMVnbTqpoJ+Ri12U0VR8gDn8zHVIUO6Q11v5cMuSOJTV0rls+n2E +7UZCL70UUUYBc//fclUvJ2MOxtfbRFqu3hvghCI5weJmAIn8r7O2D1/2mQvgjgn TqALF/zzJxoHS0EgjjbEsIMFkS1s8NiRJmPD3hWfOteyOogn3GHRYkaYov4YQGD3 YYfdjIWviS0meKMdQD59 =fI60 -----END PGP SIGNATURE-----

On Fri, Mar 6, 2015 at 10:34 AM, Sebastian <sebix@sebix.at> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Hi all,
As this also affects .npy files, which uses pickle internally, why can't this be done by Numpy itself? This breaks backwards compatibility in a very bad way in my opinion.
The company I worked for uses Numpy and consorts a lot and also has many data in .npy and pickle files. They currently work with 2.7, but I also tried to develop my programs to be compatible with Py 3. But this was not possible when it came to the point of dumping and loading npy files. I think this will be major reason why people won't take the step forward to Py3 and Numpy is not considered to be compatible to Python 3.
Are you suggesting adding a flag to the files to mark the python version in which they were created? The *.npy format is versioned, so something could probably be done with that. Chuck

A slightly different way to look at this is one of sharing data. If I am working on a system with 3.4 and I want to share data with others who may be using a mix of 2.7 and 3.3 systems, this problem makes npz format much less attractive. Ben Root On Fri, Mar 6, 2015 at 12:51 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
On Fri, Mar 6, 2015 at 10:34 AM, Sebastian <sebix@sebix.at> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Hi all,
As this also affects .npy files, which uses pickle internally, why can't this be done by Numpy itself? This breaks backwards compatibility in a very bad way in my opinion.
The company I worked for uses Numpy and consorts a lot and also has many data in .npy and pickle files. They currently work with 2.7, but I also tried to develop my programs to be compatible with Py 3. But this was not possible when it came to the point of dumping and loading npy files. I think this will be major reason why people won't take the step forward to Py3 and Numpy is not considered to be compatible to Python 3.
Are you suggesting adding a flag to the files to mark the python version in which they were created? The *.npy format is versioned, so something could probably be done with that.
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

06.03.2015, 20:00, Benjamin Root kirjoitti:
A slightly different way to look at this is one of sharing data. If I am working on a system with 3.4 and I want to share data with others who may be using a mix of 2.7 and 3.3 systems, this problem makes npz format much less attractive.
pickle is used in npy files only if there are object arrays in them. Of course, savez could just decline saving object arrays.

On 2015/03/06 10:23 AM, Pauli Virtanen wrote:
06.03.2015, 20:00, Benjamin Root kirjoitti:
A slightly different way to look at this is one of sharing data. If I am working on a system with 3.4 and I want to share data with others who may be using a mix of 2.7 and 3.3 systems, this problem makes npz format much less attractive.
pickle is used in npy files only if there are object arrays in them. Of course, savez could just decline saving object arrays.
Or issue a prominent warning. Eric

06.03.2015, 22:43, Eric Firing kirjoitti:
On 2015/03/06 10:23 AM, Pauli Virtanen wrote:
06.03.2015, 20:00, Benjamin Root kirjoitti:
A slightly different way to look at this is one of sharing data. If I am working on a system with 3.4 and I want to share data with others who may be using a mix of 2.7 and 3.3 systems, this problem makes npz format much less attractive.
pickle is used in npy files only if there are object arrays in them. Of course, savez could just decline saving object arrays.
Or issue a prominent warning.

On 07.03.2015 00:20, Pauli Virtanen wrote:
06.03.2015, 22:43, Eric Firing kirjoitti:
On 2015/03/06 10:23 AM, Pauli Virtanen wrote:
06.03.2015, 20:00, Benjamin Root kirjoitti:
A slightly different way to look at this is one of sharing data. If I am working on a system with 3.4 and I want to share data with others who may be using a mix of 2.7 and 3.3 systems, this problem makes npz format much less attractive.
pickle is used in npy files only if there are object arrays in them. Of course, savez could just decline saving object arrays.
Or issue a prominent warning.
I think the ship for a warning has long sailed. At this point its probably more an annoyance for python3 users and will not prevent many more python2 users from saving files that can't be loaded into python3.

On 2015/03/06 1:29 PM, Julian Taylor wrote:
I think the ship for a warning has long sailed. At this point its probably more an annoyance for python3 users and will not prevent many more python2 users from saving files that can't be loaded into python3.
The point of a warning is that anything that relies on pickles is fundamentally unreliable in the long term. It's potentially a surprise that the npz format relies on pickles.

07.03.2015, 01:29, Julian Taylor kirjoitti:
On 07.03.2015 00:20, Pauli Virtanen wrote:
06.03.2015, 22:43, Eric Firing kirjoitti:
On 2015/03/06 10:23 AM, Pauli Virtanen wrote:
06.03.2015, 20:00, Benjamin Root kirjoitti:
A slightly different way to look at this is one of sharing data. If I am working on a system with 3.4 and I want to share data with others who may be using a mix of 2.7 and 3.3 systems, this problem makes npz format much less attractive.
pickle is used in npy files only if there are object arrays in them. Of course, savez could just decline saving object arrays.
Or issue a prominent warning.
I think the ship for a warning has long sailed. At this point its probably more an annoyance for python3 users and will not prevent many more python2 users from saving files that can't be loaded into python3.
How about an extra use_pickle=True kwarg that can be used to disable using pickle altogether in these routines? Another reason to do this is arbitrary code execution when loading pickles: https://www.cs.jhu.edu/~s/musings/pickle.html Easily demonstrated also with npy files (loading this file will only print something unexpected, nothing more malicious): http://pav.iki.fi/tmp/unexpected.npy

On Sat, Mar 7, 2015 at 9:54 AM, Pauli Virtanen <pav@iki.fi> wrote:
How about an extra use_pickle=True kwarg that can be used to disable using pickle altogether in these routines?
If we do, I'd vastly prefer `forbid_pickle=False`. The use_pickle spelling suggests that you are asking it to use pickle when it otherwise wouldn't, which is not the intention. -- Robert Kern

On Sa, 2015-03-07 at 10:23 +0000, Robert Kern wrote:
On Sat, Mar 7, 2015 at 9:54 AM, Pauli Virtanen <pav@iki.fi> wrote:
How about an extra use_pickle=True kwarg that can be used to disable using pickle altogether in these routines?
If we do, I'd vastly prefer `forbid_pickle=False`. The use_pickle spelling suggests that you are asking it to use pickle when it otherwise wouldn't, which is not the intention.
I like the idea, at least for loading. Could also call it `allow_objects` with an explanation in the documentation. I would consider deprecating it and not allowing pickles as default, but I am not sure that is not going too far. However, I think we should be able to safely share data using npy. - Sebastian
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

06.03.2015, 22:23, Pauli Virtanen kirjoitti:
06.03.2015, 20:00, Benjamin Root kirjoitti:
A slightly different way to look at this is one of sharing data. If I am working on a system with 3.4 and I want to share data with others who may be using a mix of 2.7 and 3.3 systems, this problem makes npz format much less attractive.
pickle is used in npy files only if there are object arrays in them. Of course, savez could just decline saving object arrays.
np.load is missing the Py2-3 workaround flags that pickle.load has, probably could be added: https://github.com/numpy/numpy/pull/5640
participants (11)
-
Anrd Baecker
-
Arnd Baecker
-
Benjamin Root
-
Charles R Harris
-
Eric Firing
-
Julian Taylor
-
Pauli Virtanen
-
Robert Kern
-
Ryan Nelson
-
Sebastian
-
Sebastian Berg