[pypy-dev] pickle numpy array from pypy to cpython?

Fri Jun 24 23:21:41 EDT 2016

Heh, interestingly, if I add the following to the local dir and files
when trying to unpickle under cpython, it works (note that cpython to
pypy actually works out of the box, which I hadn't realized):

$ cat _numpypy/__init__.py
from numpy.core import *

$ cat _numpypy/multiarray.py
from numpy.core.multiarray import *
import numpy.core.multiarray as _ncm
_reconstruct = _ncm._reconstruct

This is obviously a total hack, and not one I'm comfortable with
(since I need to use this codebase from both cpython and pypy), but it
demonstrates that it's just bookkeeping that needs to change to get
things to work.

My first approach would be to add a wrapper around save_global here
https://bitbucket.org/pypy/pypy/src/a0105e0d00dbd0f73d06fc704db704868a6c6ed2/lib-python/2.7/pickle.py?at=default&fileviewer=file-view-default#pickle.py-814
that special-cases the global '_numpypy.multiarray' to instead be
'numpy.core.multiarray'. That seem like a reasonable thing to do?

Cheers,
Eli

On Fri, Jun 24, 2016 at 5:46 PM, Eli Stevens (Gmail)
<wickedgrey at gmail.com> wrote:
> Okay, if I pass the pickles through pickletools.optimize, they look
> identical except for the very first line (and a resulting systematic
> shift in offset):
>
>>>> pt.dis(pt.optimize(open('cp123.pkl').read()))
> 0: c GLOBAL 'numpy.core.multiarray _reconstruct'
>
>>>> pt.dis(pt.optimize(open('pp123.pkl').read()))
> 0: c GLOBAL '_numpypy.multiarray _reconstruct'
>
> So I suspect that simply lying about what class we just pickled would
> do the trick.
>
> I have no idea how acceptable that would be as a general solution,
> though. Thoughts?
>
> Eli
>
> On Fri, Jun 24, 2016 at 2:29 PM, Eli Stevens (Gmail)
> <wickedgrey at gmail.com> wrote:
>> Doesn't look like they are exactly the same:
>>
>> https://gist.github.com/elistevens/03e22f4684fb77d3edfe13ffcd406ef4
>>
>> Certainly some similarities, though.
>>
>> I'm not familiar with the pickle format, and I haven't yet had time to
>> dig in beyond this, though. Hoping I can tonight.
>>
>> Cheers,
>> Eli
>>
>>
>> On Fri, Jun 24, 2016 at 1:21 PM, matti picus <matti.picus at gmail.com> wrote:
>>> The first step would be to pickle the same dtype/shape/data ndarray once from numpy and again from _numpypy, and to compare the binary result. The only difference should be the class name, if the difference goes deeper that difference must be fixed. Then it it just a matter of patching pickle.py to use the desired class instead of the class name encoded into the pickled binary result.
>>> Matti
>>>
>>>> On 24 Jun 2016, at 10:43 PM, Eli Stevens (Gmail) <wickedgrey at gmail.com> wrote:
>>>>
>>>> Yeah, looks like that's still the case:
>>>>
>>>>>>>> z = np.zeros((2,3), dtype=np.float32)
>>>>>>>> z.tofile
>>>> Traceback (most recent call last):
>>>>  File "<stdin>", line 1, in <module>
>>>> AttributeError: 'numpy.ndarray' object has no attribute 'tofile'
>>>>
>>>> What would it take to get cross-interpreter numpy array pickles working?
>>>>
>>>> Thanks,
>>>> Eli
>>>>
>>>>