Multiprocessing, shared memory vs. pickled copies

John Ladasky ladasky at my-deja.com
Thu Apr 7 08:40:53 CEST 2011


Hello again, Philip,

I really appreciate you sticking with me.  Hopefully this will help
someone else, too.  I've done some more reading, and will offer some
minimal code below.

I've known about this page for a while, and it describes some of the
unconventional things one needs to consider when subclassing
numpy.ndarray:

http://www.scipy.org/Subclasses

Now, section 11.1 of the Python documentation says the following
concerning pickling:  "Classes can further influence how their
instances are pickled; if the class defines the method __getstate__(),
it is called and the return state is pickled as the contents for the
instance, instead of the contents of the instance’s dictionary. If
there is no __getstate__() method, the instance’s __dict__ is
pickled."

http://docs.python.org/release/2.6.6/library/pickle.html

That being said, I'm having problems there!  I've found this minimal
example of pickling with overridden __getstate__ and __setstate__
methods:

http://stackoverflow.com/questions/1939058/please-give-me-a-simple-example-about-setstate-and-getstate-thanks

I'll put all of the thoughts generated by these links together after
responding to a few things you wrote.

On Apr 5, 10:43 am, Philip Semanchuk <phi... at semanchuk.com> wrote:
> But as Dan Stromberg pointed out, there are some pickle-free ways to communicate between processes using multiprocessing.

I only see your reply to Dan Stromberg in this thread, but not Dan
Stromberg's original post.  I am reading this through Google Groups.
Perhaps Dan's post failed to make it through a gateway for some
reason?

> As a side note, you should always use "new style" classes, particularly since you're exploring the details of Python class construction. "New" is a bit a of misnomer now, as "new" style classes were introduced in Python 2.2. They have been the status quo in Python 2.x for a while now and are the only choice in Python 3.x.

Sorry, that was an oversight on my part.  Normally I do remember that,
but I've been doing a lot of subclassing rather than defining top-
level classes, and therefore it slipped my mind.

> One can pickle user-defined classes:

OK, I got that working, I'll show an example farther down.  (I never
tried to pickle a minimal class until now, I went straight for a hard
one.)

> And as Robert Kern pointed out, numpy arrays are also pickle-able.

OK, but SUBCLASSES of numpy.ndarray are not, in my hands, pickling as
I would expect.  I already have lots of code that is based on such
subclasses, and they do everything I want EXCEPT pickle correctly.  I
may have to direct this line of questions to the numpy discussion
group after all.

This is going to be a longer exposition.  So let me thank you in
advance for your patience, and ask you to get comfortable.


========== begin code ==========

from numpy import array, ndarray
from pickle import dumps, loads

##===

class Foo(object):

    def __init__(self, contents):
        self.contents = contents

    def __str__(self):
        return str(type(self)) + "\n" + str(self.__dict__)

##===

class SuperFoo(Foo):

    def __getstate__(self):
        print "__getstate__"
        duplicate = dict(self.__dict__)
        duplicate["bonus"] = "added during pickling"
        return duplicate

    def __setstate__(self, d):
        print "__setstate__"
        self.__dict__ = d
        self.finale = "added during unpickling"

##===

class ArraySubclass(ndarray):
    """
    See http://www.scipy.org/Subclasses, this class is very similar.
    """
    def __new__(subclass, data, extra=None, dtype=None, copy=False):
        print "  __new__"
        arr = array(data, dtype=dtype, copy=copy)
        arr = arr.view(subclass)
        if extra is not None:
            arr.extra = extra
        elif hasattr(data, "extra"):
            arr.extra = data.extra
        return arr

    def __array_finalize__(self, other):
        print "  __array_finalize__"
        self.__dict__ = getattr(other, "__dict__", {})

    def __str__(self):
        return str(type(self)) + "\n" + ndarray.__str__(self) + \
                    "\n__dict__ : " + str(self.__dict__)

##===

class PicklableArraySubclass(ArraySubclass):

    def __getstate__(self):
        print "__getstate__"
        return self.__dict__

    def __setstate__(self, d):
        print "__setstate__"
        self.__dict__ = d

##===

print "\n\n** Create a Foo object, then create a copy via pickling."
original = Foo("pickle me please")
print original
clone = loads(dumps(original))
print clone

print "\n\n** Create a SuperFoo object, just to watch __setstate__ and
__getstate__."
original = SuperFoo("pickle me too, please")
print original
clone = loads(dumps(original))
print clone

print "\n\n** Create a numpy ndarray, then create a copy via
pickling."
original = array(((1,2,3),(4,5,6)))
print original
clone = loads(dumps(original))
print clone

print "\n\n** Create an ArraySubclass object, with meta-data..."
original = ArraySubclass(((9,8,7),(6,5,4)), extra = "pickle me
PLEASE!")
print original
print "\n...now attempt to create a copy via pickling."
clone = loads(dumps(original))
print clone

print "\n\n** That failed, try a PicklableArraySubclass..."
original = PicklableArraySubclass(((1,2),(3,4)), extra = "pickle,
dangit!!!")
print original
print "\n...now try to create a copy of the PicklableArraySubclass via
pickling."
clone = loads(dumps(original))
print clone


========== end code, begin output ==========


** Create a Foo object, then create a copy via pickling.
<class '__main__.Foo'>
{'contents': 'pickle me please'}
<class '__main__.Foo'>
{'contents': 'pickle me please'}


** Create a SuperFoo object, just to watch __setstate__ and
__getstate__.
<class '__main__.SuperFoo'>
{'contents': 'pickle me too, please'}
__getstate__
__setstate__
<class '__main__.SuperFoo'>
{'bonus': 'added during pickling', 'finale': 'added during
unpickling', 'contents': 'pickle me too, please'}


** Create a numpy ndarray, then create a copy via pickling.
[[1 2 3]
 [4 5 6]]
[[1 2 3]
 [4 5 6]]


** Create an ArraySubclass object, with meta-data...
  __new__
  __array_finalize__
  __array_finalize__
  __array_finalize__
<class '__main__.ArraySubclass'>
[[9 8 7]
 [6 5 4]]
__dict__ : {'extra': 'pickle me PLEASE!'}

...now attempt to create a copy via pickling.
  __array_finalize__
  __array_finalize__
  __array_finalize__
<class '__main__.ArraySubclass'>
[[9 8 7]
 [6 5 4]]
__dict__ : {}


** That failed, try a PicklableArraySubclass...
  __new__
  __array_finalize__
  __array_finalize__
  __array_finalize__
<class '__main__.PicklableArraySubclass'>
[[1 2]
 [3 4]]
__dict__ : {'extra': 'pickle, dangit!!!'}

...now try to create a copy of the PicklableArraySubclass via
pickling.
  __array_finalize__
__setstate__
Traceback (most recent call last):
  File "minimal ndarray subclass example for posting.py", line 109, in
<module>
    clone = loads(dumps(original))
  File "/usr/lib/python2.6/pickle.py", line 1374, in loads
    return Unpickler(file).load()
  File "/usr/lib/python2.6/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.6/pickle.py", line 1217, in load_build
    setstate(state)
  File "minimal ndarray subclass example for posting.py", line 75, in
__setstate__
    self.__dict__ = d
TypeError: __dict__ must be set to a dictionary, not a 'tuple'
>Exit code: 1

========== end output ==========

Observations:

My first three examples work fine.  The second example, my SuperFoo
class, shows that I can intercept pickling attempts and add data to
__dict__.

Problems appear in the fourth example, when I try to pickle an
ArraySubclass object.  It gets the array itself, but __dict__ is not
copied, despite the fact that that is supposed to be Python's default
behavior, and my first example did indeed show this default behavior.

So in my fifth and final example, the PickleableArraySublcass object,
I tried to override __getstate__ and __setstate__ just as I did with
SuperFoo.  Here's what I see: 1) __getstate__ is NEVER called.  2)
__setstate__ is called, but it's expecting a dictionary as an
argument, and that dictionary is absent.

What's up with that?



More information about the Python-list mailing list