Mailman 3 pickle numpy array from pypy to cpython? - pypy-dev

newer
Twisted's deferToThread is much...

pickle numpy array from pypy to cpython?

Eli Stevens (Gmail)

24 Jun 2016 24 Jun '16

2:14 a.m.

I'm trying to construct some data that includes numpy arrays in pypy, pickle it, then unpickle it in cpython (to use some non-pypy-compatible libs). However, the actual class of the pickled array is _numpypy.multiarray, which cpython doesn't have. Any suggestions? Thanks, Eli

Show replies by date

William ML Leslie

24 Jun 24 Jun

2:54 a.m.

On 24 June 2016 at 12:14, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...

I'm trying to construct some data that includes numpy arrays in pypy, pickle it, then unpickle it in cpython (to use some non-pypy-compatible libs).

However, the actual class of the pickled array is _numpypy.multiarray, which cpython doesn't have.

Any suggestions?

Have you considered the tofile method and fromfile function? http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tofile.htm... http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html#nump... -- William Leslie Notice: Likely much of this email is, by the nature of copyright, covered under copyright law. You absolutely MAY reproduce any part of it in accordance with the copyright law of the nation you are reading this in. Any attempt to DENY YOU THOSE RIGHTS would be illegal without prior contractual agreement.

Eli Stevens (Gmail)

5:01 a.m.

No, since it's not *just* a numpy array I need to move around (dict with numpy values, in this case, more complicated objects in the future). Obviously I can kludge something manual together (assuming the tofile/fromfile functions work cross-interpreter, which I wouldn't take for granted at this point), but I'd rather be able to use pickle (easier to work with libraries that also expect pickles, etc.). Eli On Thu, Jun 23, 2016 at 7:54 PM, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...

On 24 June 2016 at 12:14, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...
I'm trying to construct some data that includes numpy arrays in pypy, pickle it, then unpickle it in cpython (to use some non-pypy-compatible libs).

However, the actual class of the pickled array is _numpypy.multiarray, which cpython doesn't have.

Any suggestions?

Have you considered the tofile method and fromfile function?

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tofile.htm...

http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html#nump...

-- William Leslie

Notice: Likely much of this email is, by the nature of copyright, covered under copyright law. You absolutely MAY reproduce any part of it in accordance with the copyright law of the nation you are reading this in. Any attempt to DENY YOU THOSE RIGHTS would be illegal without prior contractual agreement.

David Brochart

5:14 a.m.

Last time I tried tofile and fromfile in Numpypy it was not implemented. On Fri, Jun 24, 2016 at 7:01 AM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...

No, since it's not *just* a numpy array I need to move around (dict with numpy values, in this case, more complicated objects in the future). Obviously I can kludge something manual together (assuming the tofile/fromfile functions work cross-interpreter, which I wouldn't take for granted at this point), but I'd rather be able to use pickle (easier to work with libraries that also expect pickles, etc.).

Eli

On Thu, Jun 23, 2016 at 7:54 PM, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...
On 24 June 2016 at 12:14, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...
I'm trying to construct some data that includes numpy arrays in pypy, pickle it, then unpickle it in cpython (to use some non-pypy-compatible libs).

However, the actual class of the pickled array is _numpypy.multiarray, which cpython doesn't have.

Any suggestions?

Have you considered the tofile method and fromfile function?

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tofile.htm...

...
http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html#nump...

...
-- William Leslie

Notice: Likely much of this email is, by the nature of copyright, covered under copyright law. You absolutely MAY reproduce any part of it in accordance with the copyright law of the nation you are reading this in. Any attempt to DENY YOU THOSE RIGHTS would be illegal without prior contractual agreement.

_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev

Eli Stevens (Gmail)

7:43 p.m.

Yeah, looks like that's still the case:

...

...
...
...
z = np.zeros((2,3), dtype=np.float32) z.tofile Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'numpy.ndarray' object has no attribute 'tofile'

What would it take to get cross-interpreter numpy array pickles working? Thanks, Eli On Thu, Jun 23, 2016 at 10:14 PM, David Brochart <david.brochart@gmail.com> wrote:

...

Last time I tried tofile and fromfile in Numpypy it was not implemented.

On Fri, Jun 24, 2016 at 7:01 AM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...
No, since it's not *just* a numpy array I need to move around (dict with numpy values, in this case, more complicated objects in the future). Obviously I can kludge something manual together (assuming the tofile/fromfile functions work cross-interpreter, which I wouldn't take for granted at this point), but I'd rather be able to use pickle (easier to work with libraries that also expect pickles, etc.).

Eli

On Thu, Jun 23, 2016 at 7:54 PM, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...
On 24 June 2016 at 12:14, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...
I'm trying to construct some data that includes numpy arrays in pypy, pickle it, then unpickle it in cpython (to use some non-pypy-compatible libs).

However, the actual class of the pickled array is _numpypy.multiarray, which cpython doesn't have.

Any suggestions?

Have you considered the tofile method and fromfile function?

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tofile.htm...

http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html#nump...

-- William Leslie

Notice: Likely much of this email is, by the nature of copyright, covered under copyright law. You absolutely MAY reproduce any part of it in accordance with the copyright law of the nation you are reading this in. Any attempt to DENY YOU THOSE RIGHTS would be illegal without prior contractual agreement.

pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev

matti picus

8:21 p.m.

The first step would be to pickle the same dtype/shape/data ndarray once from numpy and again from _numpypy, and to compare the binary result. The only difference should be the class name, if the difference goes deeper that difference must be fixed. Then it it just a matter of patching pickle.py to use the desired class instead of the class name encoded into the pickled binary result. Matti

...

On 24 Jun 2016, at 10:43 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

Yeah, looks like that's still the case:

...
...
...
...
z = np.zeros((2,3), dtype=np.float32) z.tofile Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'numpy.ndarray' object has no attribute 'tofile'

What would it take to get cross-interpreter numpy array pickles working?

Thanks, Eli

Eli Stevens (Gmail)

9:29 p.m.

Doesn't look like they are exactly the same: https://gist.github.com/elistevens/03e22f4684fb77d3edfe13ffcd406ef4 Certainly some similarities, though. I'm not familiar with the pickle format, and I haven't yet had time to dig in beyond this, though. Hoping I can tonight. Cheers, Eli On Fri, Jun 24, 2016 at 1:21 PM, matti picus <matti.picus@gmail.com> wrote:

...

The first step would be to pickle the same dtype/shape/data ndarray once from numpy and again from _numpypy, and to compare the binary result. The only difference should be the class name, if the difference goes deeper that difference must be fixed. Then it it just a matter of patching pickle.py to use the desired class instead of the class name encoded into the pickled binary result. Matti

...
On 24 Jun 2016, at 10:43 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

Yeah, looks like that's still the case:

...
...
...
...
z = np.zeros((2,3), dtype=np.float32) z.tofile Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'numpy.ndarray' object has no attribute 'tofile'

What would it take to get cross-interpreter numpy array pickles working?

Thanks, Eli

Eli Stevens (Gmail)

25 Jun 25 Jun

12:46 a.m.

Okay, if I pass the pickles through pickletools.optimize, they look identical except for the very first line (and a resulting systematic shift in offset):

...

...
...
pt.dis(pt.optimize(open('cp123.pkl').read())) 0: c GLOBAL 'numpy.core.multiarray _reconstruct'

...

...
...
pt.dis(pt.optimize(open('pp123.pkl').read())) 0: c GLOBAL '_numpypy.multiarray _reconstruct'

So I suspect that simply lying about what class we just pickled would do the trick. I have no idea how acceptable that would be as a general solution, though. Thoughts? Eli On Fri, Jun 24, 2016 at 2:29 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...

Doesn't look like they are exactly the same:

https://gist.github.com/elistevens/03e22f4684fb77d3edfe13ffcd406ef4

Certainly some similarities, though.

I'm not familiar with the pickle format, and I haven't yet had time to dig in beyond this, though. Hoping I can tonight.

Cheers, Eli

On Fri, Jun 24, 2016 at 1:21 PM, matti picus <matti.picus@gmail.com> wrote:

...
The first step would be to pickle the same dtype/shape/data ndarray once from numpy and again from _numpypy, and to compare the binary result. The only difference should be the class name, if the difference goes deeper that difference must be fixed. Then it it just a matter of patching pickle.py to use the desired class instead of the class name encoded into the pickled binary result. Matti

...
On 24 Jun 2016, at 10:43 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

Yeah, looks like that's still the case:

...
...
...
> z = np.zeros((2,3), dtype=np.float32) > z.tofile Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'numpy.ndarray' object has no attribute 'tofile'

What would it take to get cross-interpreter numpy array pickles working?

Thanks, Eli

Eli Stevens (Gmail)

3:21 a.m.

Heh, interestingly, if I add the following to the local dir and files when trying to unpickle under cpython, it works (note that cpython to pypy actually works out of the box, which I hadn't realized): $ cat _numpypy/__init__.py from numpy.core import * $ cat _numpypy/multiarray.py from numpy.core.multiarray import * import numpy.core.multiarray as _ncm _reconstruct = _ncm._reconstruct This is obviously a total hack, and not one I'm comfortable with (since I need to use this codebase from both cpython and pypy), but it demonstrates that it's just bookkeeping that needs to change to get things to work. My first approach would be to add a wrapper around save_global here https://bitbucket.org/pypy/pypy/src/a0105e0d00dbd0f73d06fc704db704868a6c6ed2/lib-python/2.7/pickle.py?at=default&fileviewer=file-view-default#pickle.py-814 that special-cases the global '_numpypy.multiarray' to instead be 'numpy.core.multiarray'. That seem like a reasonable thing to do? Cheers, Eli On Fri, Jun 24, 2016 at 5:46 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...

Okay, if I pass the pickles through pickletools.optimize, they look identical except for the very first line (and a resulting systematic shift in offset):

...
...
...
pt.dis(pt.optimize(open('cp123.pkl').read())) 0: c GLOBAL 'numpy.core.multiarray _reconstruct'

...
...
...
pt.dis(pt.optimize(open('pp123.pkl').read())) 0: c GLOBAL '_numpypy.multiarray _reconstruct'

So I suspect that simply lying about what class we just pickled would do the trick.

I have no idea how acceptable that would be as a general solution, though. Thoughts?

Eli

On Fri, Jun 24, 2016 at 2:29 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...
Doesn't look like they are exactly the same:

https://gist.github.com/elistevens/03e22f4684fb77d3edfe13ffcd406ef4

Certainly some similarities, though.

I'm not familiar with the pickle format, and I haven't yet had time to dig in beyond this, though. Hoping I can tonight.

Cheers, Eli

On Fri, Jun 24, 2016 at 1:21 PM, matti picus <matti.picus@gmail.com> wrote:

...
The first step would be to pickle the same dtype/shape/data ndarray once from numpy and again from _numpypy, and to compare the binary result. The only difference should be the class name, if the difference goes deeper that difference must be fixed. Then it it just a matter of patching pickle.py to use the desired class instead of the class name encoded into the pickled binary result. Matti

...
On 24 Jun 2016, at 10:43 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

Yeah, looks like that's still the case:

...
...
>> z = np.zeros((2,3), dtype=np.float32) >> z.tofile Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'numpy.ndarray' object has no attribute 'tofile'

What would it take to get cross-interpreter numpy array pickles working?

Thanks, Eli

matti picus

4:19 a.m.

Sounds reasonable. You might want to generalize it a bit by trying to import _numpypy /numpy, and setting up the replacement by whichever fails to import. Matti On Saturday, 25 June 2016, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...

Heh, interestingly, if I add the following to the local dir and files when trying to unpickle under cpython, it works (note that cpython to pypy actually works out of the box, which I hadn't realized):

$ cat _numpypy/__init__.py from numpy.core import *

$ cat _numpypy/multiarray.py from numpy.core.multiarray import * import numpy.core.multiarray as _ncm _reconstruct = _ncm._reconstruct

This is obviously a total hack, and not one I'm comfortable with (since I need to use this codebase from both cpython and pypy), but it demonstrates that it's just bookkeeping that needs to change to get things to work.

My first approach would be to add a wrapper around save_global here

https://bitbucket.org/pypy/pypy/src/a0105e0d00dbd0f73d06fc704db704868a6c6ed2/lib-python/2.7/pickle.py?at=default&fileviewer=file-view-default#pickle.py-814 that special-cases the global '_numpypy.multiarray' to instead be 'numpy.core.multiarray'. That seem like a reasonable thing to do?

Cheers, Eli

Eli Stevens (Gmail)

6:01 a.m.

I was thinking about doing it on import of the micronumpy module (pypy/module/micronumpy/app_numpy.py). Right now, when I try and import pickle during the tests: $ cat pypy/module/micronumpy/test/test_pickling_app.py import sys import py from pypy.module.micronumpy.test.test_base import BaseNumpyAppTest from pypy.conftest import option class AppTestPicklingNumpy(BaseNumpyAppTest): def setup_class(cls): if option.runappdirect and '__pypy__' not in sys.builtin_module_names: py.test.skip("pypy only test") BaseNumpyAppTest.setup_class.im_func(cls) def test_pickle_module(self): import pickle ... # more code I get this error:

...

import struct

lib-python/2.7/pickle.py:34: _ _ _ _ _ _

...

from _struct import * E (application-level) ImportError: No module named _struct

lib-python/2.7/struct.py:1: ImportError But everything seems fine with struct: $ ./pytest.py pypy/module/struct/test/test_struct.py ==== test session starts ==== platform linux2 -- Python 2.7.11 -- py-1.4.20 -- pytest-2.5.2 pytest-2.5.2 from /home/elis/edit/play/pypy/pytest.pyc collected 30 items pypy/module/struct/test/test_struct.py .............................. ==== 30 passed in 11.95 seconds ==== Any idea what's going on here? Thanks, Eli On Fri, Jun 24, 2016 at 9:19 PM, matti picus <matti.picus@gmail.com> wrote:

...

Sounds reasonable. You might want to generalize it a bit by trying to import _numpypy /numpy, and setting up the replacement by whichever fails to import. Matti

On Saturday, 25 June 2016, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...
Heh, interestingly, if I add the following to the local dir and files when trying to unpickle under cpython, it works (note that cpython to pypy actually works out of the box, which I hadn't realized):

$ cat _numpypy/__init__.py from numpy.core import *

$ cat _numpypy/multiarray.py from numpy.core.multiarray import * import numpy.core.multiarray as _ncm _reconstruct = _ncm._reconstruct

This is obviously a total hack, and not one I'm comfortable with (since I need to use this codebase from both cpython and pypy), but it demonstrates that it's just bookkeeping that needs to change to get things to work.

My first approach would be to add a wrapper around save_global here

https://bitbucket.org/pypy/pypy/src/a0105e0d00dbd0f73d06fc704db704868a6c6ed2/lib-python/2.7/pickle.py?at=default&fileviewer=file-view-default#pickle.py-814 that special-cases the global '_numpypy.multiarray' to instead be 'numpy.core.multiarray'. That seem like a reasonable thing to do?

Cheers, Eli

Matti Picus

5:26 p.m.

You need to add the modules to those that the class-local space is built with using a spaceconfig, so something like class AppTestPicklingNumpy(BaseNumpyAppTest): spaceconfig = dict(usemodules=["micronumpy", "struct", "binascii"]) def setup_class(cls): if option.runappdirect and '__pypy__' not in sys.builtin_module_names: py.test.skip("pypy only test") BaseNumpyAppTest.setup_class.im_func(cls) def test_pickle_module(self): import pickle On 25/06/16 09:01, Eli Stevens (Gmail) wrote:

...

I was thinking about doing it on import of the micronumpy module (pypy/module/micronumpy/app_numpy.py).

Right now, when I try and import pickle during the tests:

$ cat pypy/module/micronumpy/test/test_pickling_app.py import sys import py

from pypy.module.micronumpy.test.test_base import BaseNumpyAppTest from pypy.conftest import option

class AppTestPicklingNumpy(BaseNumpyAppTest): def setup_class(cls): if option.runappdirect and '__pypy__' not in sys.builtin_module_names: py.test.skip("pypy only test") BaseNumpyAppTest.setup_class.im_func(cls)

def test_pickle_module(self): import pickle ... # more code

I get this error:

...
import struct lib-python/2.7/pickle.py:34:

...
from _struct import * E (application-level) ImportError: No module named _struct

lib-python/2.7/struct.py:1: ImportError

But everything seems fine with struct:

$ ./pytest.py pypy/module/struct/test/test_struct.py ==== test session starts ==== platform linux2 -- Python 2.7.11 -- py-1.4.20 -- pytest-2.5.2 pytest-2.5.2 from /home/elis/edit/play/pypy/pytest.pyc collected 30 items

pypy/module/struct/test/test_struct.py ..............................

==== 30 passed in 11.95 seconds ====

Any idea what's going on here?

Thanks, Eli

On Fri, Jun 24, 2016 at 9:19 PM, matti picus <matti.picus@gmail.com> wrote:

...
Sounds reasonable. You might want to generalize it a bit by trying to import _numpypy /numpy, and setting up the replacement by whichever fails to import. Matti

On Saturday, 25 June 2016, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...
Heh, interestingly, if I add the following to the local dir and files when trying to unpickle under cpython, it works (note that cpython to pypy actually works out of the box, which I hadn't realized):

$ cat _numpypy/__init__.py from numpy.core import *

$ cat _numpypy/multiarray.py from numpy.core.multiarray import * import numpy.core.multiarray as _ncm _reconstruct = _ncm._reconstruct

This is obviously a total hack, and not one I'm comfortable with (since I need to use this codebase from both cpython and pypy), but it demonstrates that it's just bookkeeping that needs to change to get things to work.

My first approach would be to add a wrapper around save_global here

https://bitbucket.org/pypy/pypy/src/a0105e0d00dbd0f73d06fc704db704868a6c6ed2/lib-python/2.7/pickle.py?at=default&fileviewer=file-view-default#pickle.py-814 that special-cases the global '_numpypy.multiarray' to instead be 'numpy.core.multiarray'. That seem like a reasonable thing to do?

Cheers, Eli

Eli Stevens (Gmail)

26 Jun 26 Jun

3:19 a.m.

That did the trick. Pull request here: https://bitbucket.org/pypy/pypy/pull-requests/460/changes-reported-location-... Please let me know if there are changes that should be made. As noted, I'm not super happy with the tests, but am unsure what direction I should go with them. Cheers, Eli On Sat, Jun 25, 2016 at 10:26 AM, Matti Picus <matti.picus@gmail.com> wrote:

...

You need to add the modules to those that the class-local space is built with using a spaceconfig, so something like

class AppTestPicklingNumpy(BaseNumpyAppTest): spaceconfig = dict(usemodules=["micronumpy", "struct", "binascii"])

def setup_class(cls): if option.runappdirect and '__pypy__' not in sys.builtin_module_names: py.test.skip("pypy only test") BaseNumpyAppTest.setup_class.im_func(cls)

def test_pickle_module(self): import pickle

On 25/06/16 09:01, Eli Stevens (Gmail) wrote:

...
I was thinking about doing it on import of the micronumpy module (pypy/module/micronumpy/app_numpy.py).

Right now, when I try and import pickle during the tests:

$ cat pypy/module/micronumpy/test/test_pickling_app.py import sys import py

from pypy.module.micronumpy.test.test_base import BaseNumpyAppTest from pypy.conftest import option

class AppTestPicklingNumpy(BaseNumpyAppTest): def setup_class(cls): if option.runappdirect and '__pypy__' not in sys.builtin_module_names: py.test.skip("pypy only test") BaseNumpyAppTest.setup_class.im_func(cls)

def test_pickle_module(self): import pickle ... # more code

I get this error:

...
import struct

lib-python/2.7/pickle.py:34: _ _ _ _ _ _

...
from _struct import *

E (application-level) ImportError: No module named _struct

lib-python/2.7/struct.py:1: ImportError

But everything seems fine with struct:

$ ./pytest.py pypy/module/struct/test/test_struct.py ==== test session starts ==== platform linux2 -- Python 2.7.11 -- py-1.4.20 -- pytest-2.5.2 pytest-2.5.2 from /home/elis/edit/play/pypy/pytest.pyc collected 30 items

pypy/module/struct/test/test_struct.py ..............................

==== 30 passed in 11.95 seconds ====

Any idea what's going on here?

Thanks, Eli

On Fri, Jun 24, 2016 at 9:19 PM, matti picus <matti.picus@gmail.com> wrote:

...
Sounds reasonable. You might want to generalize it a bit by trying to import _numpypy /numpy, and setting up the replacement by whichever fails to import. Matti

On Saturday, 25 June 2016, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...
Heh, interestingly, if I add the following to the local dir and files when trying to unpickle under cpython, it works (note that cpython to pypy actually works out of the box, which I hadn't realized):

$ cat _numpypy/__init__.py from numpy.core import *

$ cat _numpypy/multiarray.py from numpy.core.multiarray import * import numpy.core.multiarray as _ncm _reconstruct = _ncm._reconstruct

This is obviously a total hack, and not one I'm comfortable with (since I need to use this codebase from both cpython and pypy), but it demonstrates that it's just bookkeeping that needs to change to get things to work.

My first approach would be to add a wrapper around save_global here

https://bitbucket.org/pypy/pypy/src/a0105e0d00dbd0f73d06fc704db704868a6c6ed2/lib-python/2.7/pickle.py?at=default&fileviewer=file-view-default#pickle.py-814 that special-cases the global '_numpypy.multiarray' to instead be 'numpy.core.multiarray'. That seem like a reasonable thing to do?

Cheers, Eli

Eli Stevens (Gmail)

29 Jun 29 Jun

4:59 p.m.

Any thoughts on if this approach is acceptable? Happy to incorporate feedback. I wouldn't be surprised if there are more functions than just _reconstruct that will need to be special cased, but without a concrete use case I wasn't going to complicate things. Thanks, Eli On Sat, Jun 25, 2016 at 8:19 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...

That did the trick.

Pull request here: https://bitbucket.org/pypy/pypy/pull-requests/460/changes-reported-location-...

Please let me know if there are changes that should be made. As noted, I'm not super happy with the tests, but am unsure what direction I should go with them.

Cheers, Eli

On Sat, Jun 25, 2016 at 10:26 AM, Matti Picus <matti.picus@gmail.com> wrote:

...
You need to add the modules to those that the class-local space is built with using a spaceconfig, so something like

class AppTestPicklingNumpy(BaseNumpyAppTest): spaceconfig = dict(usemodules=["micronumpy", "struct", "binascii"])

def setup_class(cls): if option.runappdirect and '__pypy__' not in sys.builtin_module_names: py.test.skip("pypy only test") BaseNumpyAppTest.setup_class.im_func(cls)

def test_pickle_module(self): import pickle

On 25/06/16 09:01, Eli Stevens (Gmail) wrote:

...
I was thinking about doing it on import of the micronumpy module (pypy/module/micronumpy/app_numpy.py).

Right now, when I try and import pickle during the tests:

$ cat pypy/module/micronumpy/test/test_pickling_app.py import sys import py

from pypy.module.micronumpy.test.test_base import BaseNumpyAppTest from pypy.conftest import option

class AppTestPicklingNumpy(BaseNumpyAppTest): def setup_class(cls): if option.runappdirect and '__pypy__' not in sys.builtin_module_names: py.test.skip("pypy only test") BaseNumpyAppTest.setup_class.im_func(cls)

def test_pickle_module(self): import pickle ... # more code

I get this error:

...
import struct

lib-python/2.7/pickle.py:34: _ _ _ _ _ _

...
from _struct import *

E (application-level) ImportError: No module named _struct

lib-python/2.7/struct.py:1: ImportError

But everything seems fine with struct:

$ ./pytest.py pypy/module/struct/test/test_struct.py ==== test session starts ==== platform linux2 -- Python 2.7.11 -- py-1.4.20 -- pytest-2.5.2 pytest-2.5.2 from /home/elis/edit/play/pypy/pytest.pyc collected 30 items

pypy/module/struct/test/test_struct.py ..............................

==== 30 passed in 11.95 seconds ====

Any idea what's going on here?

Thanks, Eli

On Fri, Jun 24, 2016 at 9:19 PM, matti picus <matti.picus@gmail.com> wrote:

...
Sounds reasonable. You might want to generalize it a bit by trying to import _numpypy /numpy, and setting up the replacement by whichever fails to import. Matti

On Saturday, 25 June 2016, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...
Heh, interestingly, if I add the following to the local dir and files when trying to unpickle under cpython, it works (note that cpython to pypy actually works out of the box, which I hadn't realized):

$ cat _numpypy/__init__.py from numpy.core import *

$ cat _numpypy/multiarray.py from numpy.core.multiarray import * import numpy.core.multiarray as _ncm _reconstruct = _ncm._reconstruct

This is obviously a total hack, and not one I'm comfortable with (since I need to use this codebase from both cpython and pypy), but it demonstrates that it's just bookkeeping that needs to change to get things to work.

My first approach would be to add a wrapper around save_global here

https://bitbucket.org/pypy/pypy/src/a0105e0d00dbd0f73d06fc704db704868a6c6ed2/lib-python/2.7/pickle.py?at=default&fileviewer=file-view-default#pickle.py-814 that special-cases the global '_numpypy.multiarray' to instead be 'numpy.core.multiarray'. That seem like a reasonable thing to do?

Cheers, Eli

Matti Picus

6:31 p.m.

I think I would prefer this be done in upstream numpy (which is 95% supported by PyPy's cpyext layer) rather than changing the class name when saving a _numpypy ndarray. In both cases, a warning should be emitted when loading the "wrong" object to tell the user that subtle problems may occur, for instance with complicated record dtypes or with arrays of objects. Your pull request seems OK, it needs tests of more complicated numpy types like scalars and record arrays. Again, I would be happier if it spit out some kind of warning when overriding the object name. Maybe we should merge it until we can fix upstream numpy? Does anyone else have an opinion? Matti On 29/06/16 19:59, Eli Stevens (Gmail) wrote:

...

Any thoughts on if this approach is acceptable? Happy to incorporate feedback.

I wouldn't be surprised if there are more functions than just _reconstruct that will need to be special cased, but without a concrete use case I wasn't going to complicate things.

Thanks, Eli

Eli Stevens (Gmail)

7:37 p.m.

To make sure I'm understanding, are you saying that upstream/cpython numpy should pick up an alternate way to import multiarray (via _numpypy.multiarray, instead of numpy.core.multiarray), similar to how one can `import numpy` under pypy, even though the real implementation is in `_numpypy`? Thanks, Eli On Wed, Jun 29, 2016 at 11:31 AM, Matti Picus <matti.picus@gmail.com> wrote:

...

I think I would prefer this be done in upstream numpy (which is 95% supported by PyPy's cpyext layer) rather than changing the class name when saving a _numpypy ndarray. In both cases, a warning should be emitted when loading the "wrong" object to tell the user that subtle problems may occur, for instance with complicated record dtypes or with arrays of objects.

Your pull request seems OK, it needs tests of more complicated numpy types like scalars and record arrays. Again, I would be happier if it spit out some kind of warning when overriding the object name. Maybe we should merge it until we can fix upstream numpy? Does anyone else have an opinion? Matti

On 29/06/16 19:59, Eli Stevens (Gmail) wrote:

...
Any thoughts on if this approach is acceptable? Happy to incorporate feedback.

I wouldn't be surprised if there are more functions than just _reconstruct that will need to be special cased, but without a concrete use case I wasn't going to complicate things.

Thanks, Eli

_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev

Armin Rigo

30 Jun 30 Jun

5:58 a.m.

Hi Eli, hi Matti, On 29 June 2016 at 21:37, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...

To make sure I'm understanding, are you saying that upstream/cpython numpy should pick up an alternate way to import multiarray (via _numpypy.multiarray, instead of numpy.core.multiarray)

Hum, in my opinion we should always pickle/unpickle arrays by reproducing and expecting the exact same format as CPython's numpy, with no warnings. Any difference (e.g. with complicated dtypes) is a bug that should eventually be fixed. A bientôt, Armin.

Eli Stevens (Gmail)

11 Jul 11 Jul

4:43 p.m.

FWVLIW, I think that conforming to upstream numpy makes the most sense. I think that the issue would go away if the `_numpypy` module were renamed to `numpy` (and appropriate things moved into `numpy.core`). Is there a technical reason to keep the actual implementation in a separately named module? Thinking larger picture, would it be possible and sensible to switch to using the slow cpyext numpy approach for compatability, then overlay custom implementation of things on top of that when speed is needed? I'm imagining a vague inverse of the cpython approach, where modules are implemented in C when the python performance isn't acceptable. Eli On Wed, Jun 29, 2016 at 10:58 PM, Armin Rigo <arigo@tunes.org> wrote:

...

Hi Eli, hi Matti,

On 29 June 2016 at 21:37, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...
To make sure I'm understanding, are you saying that upstream/cpython numpy should pick up an alternate way to import multiarray (via _numpypy.multiarray, instead of numpy.core.multiarray)

Hum, in my opinion we should always pickle/unpickle arrays by reproducing and expecting the exact same format as CPython's numpy, with no warnings. Any difference (e.g. with complicated dtypes) is a bug that should eventually be fixed.

A bientôt,

Armin.

David Brochart

15 Jul 15 Jul

3:47 p.m.

Hi, I'd like to use the (numerical) performances of PyPy as an equivalent to Numba's @jit decorator (https://github.com/davidbrochart/piopio). The only thing preventing that right now is the passing around (pickling) of Numpy arrays, so it would be great to have that compatibility. David. On Mon, Jul 11, 2016 at 6:43 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...

FWVLIW, I think that conforming to upstream numpy makes the most sense.

I think that the issue would go away if the `_numpypy` module were renamed to `numpy` (and appropriate things moved into `numpy.core`). Is there a technical reason to keep the actual implementation in a separately named module?

Thinking larger picture, would it be possible and sensible to switch to using the slow cpyext numpy approach for compatability, then overlay custom implementation of things on top of that when speed is needed? I'm imagining a vague inverse of the cpython approach, where modules are implemented in C when the python performance isn't acceptable.

Eli

On Wed, Jun 29, 2016 at 10:58 PM, Armin Rigo <arigo@tunes.org> wrote:

...
Hi Eli, hi Matti,

On 29 June 2016 at 21:37, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:

...
To make sure I'm understanding, are you saying that upstream/cpython numpy should pick up an alternate way to import multiarray (via _numpypy.multiarray, instead of numpy.core.multiarray)

Hum, in my opinion we should always pickle/unpickle arrays by reproducing and expecting the exact same format as CPython's numpy, with no warnings. Any difference (e.g. with complicated dtypes) is a bug that should eventually be fixed.

A bientôt,

Armin.

pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev

Matti Picus

16 Jul 16 Jul

10:07 a.m.

The issue with '_numpypy.multiarray' in the pickle string rather than 'numpy.core.multiarray' should be fixed on the numpypy_pickle_compat branch (thanks to Eli) A linux 64 build is available http://buildbot.pypy.org/nightly/numpypy_pickle_compat/pypy-c-jit-85727-6d90.... Eli or David or anyone who uses numpy pickle, could you check that this works as advertised? I am concerned about how compatible our pickling is with upstream numpy, but do not really use that feature of numpy so another pair of eyes would be nice before merging to default. Note this requires that http://bitbucket.org/pypy/numpy be installed since the Unpickler must be able to import numpy.core.multiarray Matti On 15/07/16 10:47, David Brochart wrote:

...

Hi,

I'd like to use the (numerical) performances of PyPy as an equivalent to Numba's @jit decorator (https://github.com/davidbrochart/piopio). The only thing preventing that right now is the passing around (pickling) of Numpy arrays, so it would be great to have that compatibility.

David.

On Mon, Jul 11, 2016 at 6:43 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com <mailto:wickedgrey@gmail.com>> wrote:

FWVLIW, I think that conforming to upstream numpy makes the most sense.

I think that the issue would go away if the `_numpypy` module were renamed to `numpy` (and appropriate things moved into `numpy.core`). Is there a technical reason to keep the actual implementation in a separately named module?

Thinking larger picture, would it be possible and sensible to switch to using the slow cpyext numpy approach for compatability, then overlay custom implementation of things on top of that when speed is needed? I'm imagining a vague inverse of the cpython approach, where modules are implemented in C when the python performance isn't acceptable.

Eli

On Wed, Jun 29, 2016 at 10:58 PM, Armin Rigo <arigo@tunes.org <mailto:arigo@tunes.org>> wrote: > Hi Eli, hi Matti, > > On 29 June 2016 at 21:37, Eli Stevens (Gmail) <wickedgrey@gmail.com <mailto:wickedgrey@gmail.com>> wrote: >> To make sure I'm understanding, are you saying that upstream/cpython >> numpy should pick up an alternate way to import multiarray (via >> _numpypy.multiarray, instead of numpy.core.multiarray) > > Hum, in my opinion we should always pickle/unpickle arrays by > reproducing and expecting the exact same format as CPython's numpy, > with no warnings. Any difference (e.g. with complicated dtypes) is a > bug that should eventually be fixed. > > > A bientôt, > > Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org <mailto:pypy-dev@python.org> https://mail.python.org/mailman/listinfo/pypy-dev

_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev

David Brochart

12:04 p.m.

Hi, I verified that this version of PyPy can load a Numpy array that was pickled by CPython (and do stuff with it), but it looks like a Numpy array pickled by PyPy cannot be loaded by CPython, because PyPy still uses '_numpypy.multiarray' in the pickle string for dumping: ImportError: No module named _numpypy.multiarray David. On Sat, Jul 16, 2016 at 12:07 PM, Matti Picus <matti.picus@gmail.com> wrote:

...

The issue with '_numpypy.multiarray' in the pickle string rather than 'numpy.core.multiarray' should be fixed on the numpypy_pickle_compat branch (thanks to Eli) A linux 64 build is available http://buildbot.pypy.org/nightly/numpypy_pickle_compat/pypy-c-jit-85727-6d90... . Eli or David or anyone who uses numpy pickle, could you check that this works as advertised? I am concerned about how compatible our pickling is with upstream numpy, but do not really use that feature of numpy so another pair of eyes would be nice before merging to default.

Note this requires that http://bitbucket.org/pypy/numpy be installed since the Unpickler must be able to import numpy.core.multiarray Matti

On 15/07/16 10:47, David Brochart wrote:

...
Hi,

I'd like to use the (numerical) performances of PyPy as an equivalent to Numba's @jit decorator (https://github.com/davidbrochart/piopio). The only thing preventing that right now is the passing around (pickling) of Numpy arrays, so it would be great to have that compatibility.

David.

On Mon, Jul 11, 2016 at 6:43 PM, Eli Stevens (Gmail) < wickedgrey@gmail.com <mailto:wickedgrey@gmail.com>> wrote:

FWVLIW, I think that conforming to upstream numpy makes the most sense.

I think that the issue would go away if the `_numpypy` module were renamed to `numpy` (and appropriate things moved into `numpy.core`). Is there a technical reason to keep the actual implementation in a separately named module?

Thinking larger picture, would it be possible and sensible to switch to using the slow cpyext numpy approach for compatability, then overlay custom implementation of things on top of that when speed is needed? I'm imagining a vague inverse of the cpython approach, where modules are implemented in C when the python performance isn't acceptable.

Eli

On Wed, Jun 29, 2016 at 10:58 PM, Armin Rigo <arigo@tunes.org <mailto:arigo@tunes.org>> wrote: > Hi Eli, hi Matti, > > On 29 June 2016 at 21:37, Eli Stevens (Gmail) <wickedgrey@gmail.com <mailto:wickedgrey@gmail.com>> wrote: >> To make sure I'm understanding, are you saying that upstream/cpython >> numpy should pick up an alternate way to import multiarray (via >> _numpypy.multiarray, instead of numpy.core.multiarray) > > Hum, in my opinion we should always pickle/unpickle arrays by > reproducing and expecting the exact same format as CPython's numpy, > with no warnings. Any difference (e.g. with complicated dtypes) is a > bug that should eventually be fixed. > > > A bientôt, > > Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org <mailto:pypy-dev@python.org> https://mail.python.org/mailman/listinfo/pypy-dev

_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev

David Brochart

12:40 p.m.

To be more precise, PyPy pickling of Numpy arrays works fine, it is when PyPy pickles a Numpy scalar that I get the error. David. On Sat, Jul 16, 2016 at 2:04 PM, David Brochart <david.brochart@gmail.com> wrote:

...

Hi,

I verified that this version of PyPy can load a Numpy array that was pickled by CPython (and do stuff with it), but it looks like a Numpy array pickled by PyPy cannot be loaded by CPython, because PyPy still uses '_numpypy.multiarray' in the pickle string for dumping: ImportError: No module named _numpypy.multiarray

David.

On Sat, Jul 16, 2016 at 12:07 PM, Matti Picus <matti.picus@gmail.com> wrote:

...
The issue with '_numpypy.multiarray' in the pickle string rather than 'numpy.core.multiarray' should be fixed on the numpypy_pickle_compat branch (thanks to Eli) A linux 64 build is available http://buildbot.pypy.org/nightly/numpypy_pickle_compat/pypy-c-jit-85727-6d90... . Eli or David or anyone who uses numpy pickle, could you check that this works as advertised? I am concerned about how compatible our pickling is with upstream numpy, but do not really use that feature of numpy so another pair of eyes would be nice before merging to default.

Note this requires that http://bitbucket.org/pypy/numpy be installed since the Unpickler must be able to import numpy.core.multiarray Matti

On 15/07/16 10:47, David Brochart wrote:

...
Hi,

I'd like to use the (numerical) performances of PyPy as an equivalent to Numba's @jit decorator (https://github.com/davidbrochart/piopio). The only thing preventing that right now is the passing around (pickling) of Numpy arrays, so it would be great to have that compatibility.

David.

On Mon, Jul 11, 2016 at 6:43 PM, Eli Stevens (Gmail) < wickedgrey@gmail.com <mailto:wickedgrey@gmail.com>> wrote:

FWVLIW, I think that conforming to upstream numpy makes the most sense.

I think that the issue would go away if the `_numpypy` module were renamed to `numpy` (and appropriate things moved into `numpy.core`). Is there a technical reason to keep the actual implementation in a separately named module?

Thinking larger picture, would it be possible and sensible to switch to using the slow cpyext numpy approach for compatability, then overlay custom implementation of things on top of that when speed is needed? I'm imagining a vague inverse of the cpython approach, where modules are implemented in C when the python performance isn't acceptable.

Eli

On Wed, Jun 29, 2016 at 10:58 PM, Armin Rigo <arigo@tunes.org <mailto:arigo@tunes.org>> wrote: > Hi Eli, hi Matti, > > On 29 June 2016 at 21:37, Eli Stevens (Gmail) <wickedgrey@gmail.com <mailto:wickedgrey@gmail.com>> wrote: >> To make sure I'm understanding, are you saying that upstream/cpython >> numpy should pick up an alternate way to import multiarray (via >> _numpypy.multiarray, instead of numpy.core.multiarray) > > Hum, in my opinion we should always pickle/unpickle arrays by > reproducing and expecting the exact same format as CPython's numpy, > with no warnings. Any difference (e.g. with complicated dtypes) is a > bug that should eventually be fixed. > > > A bientôt, > > Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org <mailto:pypy-dev@python.org> https://mail.python.org/mailman/listinfo/pypy-dev

_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev

Matti Picus

4:29 p.m.

Eli Stevens (Gmail)

6:12 p.m.

I am not surprised that my current branch doesn't cover all cases; it was specifically targeted at my exact, singular use case. I'll work on making something more general, as well as improving test coverage. On Sat, Jul 16, 2016 at 9:29 AM, Matti Picus <matti.picus@gmail.com> wrote:

...

So it seems the tests are lacking. Someone should: - go through all the existing calls to dumps in tests and add "assert '_numpypy' not in data" - add tests for scalars - fix so the tests pass Matti

On 16/07/16 07:40, David Brochart wrote:

To be more precise, PyPy pickling of Numpy arrays works fine, it is when PyPy pickles a Numpy scalar that I get the error. David.

On Sat, Jul 16, 2016 at 2:04 PM, David Brochart <david.brochart@gmail.com> wrote:

...
Hi,

I verified that this version of PyPy can load a Numpy array that was pickled by CPython (and do stuff with it), but it looks like a Numpy array pickled by PyPy cannot be loaded by CPython, because PyPy still uses '_numpypy.multiarray' in the pickle string for dumping: ImportError: No module named _numpypy.multiarray

David.

On Sat, Jul 16, 2016 at 12:07 PM, Matti Picus <matti.picus@gmail.com> wrote:

...
The issue with '_numpypy.multiarray' in the pickle string rather than 'numpy.core.multiarray' should be fixed on the numpypy_pickle_compat branch (thanks to Eli) A linux 64 build is available http://buildbot.pypy.org/nightly/numpypy_pickle_compat/pypy-c-jit-85727-6d90.... Eli or David or anyone who uses numpy pickle, could you check that this works as advertised? I am concerned about how compatible our pickling is with upstream numpy, but do not really use that feature of numpy so another pair of eyes would be nice before merging to default.

Note this requires that http://bitbucket.org/pypy/numpy be installed since the Unpickler must be able to import numpy.core.multiarray Matti

On 15/07/16 10:47, David Brochart wrote:

...
Hi,

I'd like to use the (numerical) performances of PyPy as an equivalent to Numba's @jit decorator (https://github.com/davidbrochart/piopio). The only thing preventing that right now is the passing around (pickling) of Numpy arrays, so it would be great to have that compatibility.

David.

On Mon, Jul 11, 2016 at 6:43 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com <mailto:wickedgrey@gmail.com>> wrote:

FWVLIW, I think that conforming to upstream numpy makes the most sense.

I think that the issue would go away if the `_numpypy` module were renamed to `numpy` (and appropriate things moved into `numpy.core`). Is there a technical reason to keep the actual implementation in a separately named module?

Thinking larger picture, would it be possible and sensible to switch to using the slow cpyext numpy approach for compatability, then overlay custom implementation of things on top of that when speed is needed? I'm imagining a vague inverse of the cpython approach, where modules are implemented in C when the python performance isn't acceptable.

Eli

On Wed, Jun 29, 2016 at 10:58 PM, Armin Rigo <arigo@tunes.org <mailto:arigo@tunes.org>> wrote: > Hi Eli, hi Matti, > > On 29 June 2016 at 21:37, Eli Stevens (Gmail) <wickedgrey@gmail.com <mailto:wickedgrey@gmail.com>> wrote: >> To make sure I'm understanding, are you saying that upstream/cpython >> numpy should pick up an alternate way to import multiarray (via >> _numpypy.multiarray, instead of numpy.core.multiarray) > > Hum, in my opinion we should always pickle/unpickle arrays by > reproducing and expecting the exact same format as CPython's numpy, > with no warnings. Any difference (e.g. with complicated dtypes) is a > bug that should eventually be fixed. > > > A bientôt, > > Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org <mailto:pypy-dev@python.org> https://mail.python.org/mailman/listinfo/pypy-dev

_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev

3038

Age (days ago)

3060

Last active (days ago)

List overview

Download

23 comments

6 participants

participants (6)

Armin Rigo
David Brochart
Eli Stevens (Gmail)
matti picus
Matti Picus
William ML Leslie

pickle numpy array from pypy to cpython?

David Brochart

David Brochart

David Brochart

David Brochart

tags

participants (6)