Multiprocessing, shared memory vs. pickled copies

John Ladasky ladasky at my-deja.com
Tue Apr 5 12:58:05 EDT 2011


Hi Philip,

Thanks for the reply.

On Apr 4, 4:34 pm, Philip Semanchuk <phi... at semanchuk.com> wrote:
> So if you're going to use multiprocessing, you're going to use pickle, and you
> need pickleable objects.

OK, that's good to know.

> > Pickling is still a pretty
> > vague progress to me, but I can see that you have to write custom
> > __reduce__ and __setstate__ methods for your objects.
>
> Well, that's only if one's objects don't support pickle by default. A lot of
> classes do without any need for custom __reduce__ and __setstate__ methods.
> Since you're apparently not too familiar with pickle, I don't want you to get
> the false impression that it's a lot of trouble. I've used pickle a number of
> times and never had to write custom methods for it.

I used __reduce__ and __setstate__ once before with success, but I
just hacked at code I found on the Net, without fully understanding
it.

All right, since my last post, I've made an attempt to understand a
bit more about pickle.  In doing so, I'm getting into the guts of the
Python class model, in a way that I did not expect.  I believe that I
need to understand the following:

1) What kinds of objects have a __dict__,

2) Why the default pickling methods do not simply look for a __dict__
and serialize its contents, if it is present, and

3) Why objects can exist that do not support pickle by default, and
cannot be correctly pickled without custom __reduce__ and __setstate__
methods.

I can only furnish a (possibly incomplete) answer to question #1.
User subclasses apparently always have a __dict__.  Here:

    >>> class Foo:
    ...     pass # It doesn't get any simpler
    ...
    >>> x = Foo()
    >>> x
    <__builtin__.Foo instance at 0x9b916cc>
    >>> x.__dict__
    {}
    >>> x.spam = "eggs"
    >>> x.__dict__
    {'spam': 'eggs'}
    >>> setattr(x, "parrot", "dead")
    <__builtin__.Bar instance at 0x9b916cc>
    >>> dir(x)
    ['__doc__', '__module__', 'parrot', 'spam']
    >>> x.__dict__
    {'parrot': 'dead', 'spam': 'eggs'}
    >>> x.__dict__ = {"a":"b", "c":"d"}
    >>> dir(x)
    ['__doc__', '__module__', 'a', 'c']
    >>> x.__dict__
    {'a': 'b', 'c': 'd'}

Aside: this finding makes me wonder what exactly dir() is showing me
about an object (all objects in the namespace, including methods, but
not __dict__ itself?), and why it is distinct from __dict__.

In contrast, it appears that BUILT-IN classes (numbers, lists,
dictionaries, etc.) do not have a __dict__.  And that you can't force
a __dict__ into built-in objects, no matter how you may try.

    >>> n = 1
    >>> dir(n)
    ['__abs__', '__add__', '__and__', '__class__', '__cmp__',
'__coerce__',
    '__delattr__', '__div__', '__divmod__', '__doc__', '__float__',
'__floordiv__',
    '__format__', '__getattribute__', '__getnewargs__', '__hash__',
'__hex__',
    '__index__', '__init__', '__int__', '__invert__', '__long__',
'__lshift__',
    '__mod__', '__mul__', '__neg__', '__new__', '__nonzero__',
'__oct__', '__or__',
    '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__',
'__rdivmod__',
    '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__',
'__rlshift__',
    '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__',
'__rshift__',
    '__rsub__', '__rtruediv__', '__rxor__', '__setattr__',
'__sizeof__', '__str__',
    '__sub__', '__subclasshook__', '__truediv__', '__trunc__',
'__xor__', 'conjugate',
    'denominator', 'imag', 'numerator', 'real']
    >>> n.__dict__
      File "<console>", line 1, in <module>
    ''' <type 'exceptions.AttributeError'> : 'int' object has no
attribute '__dict__' '''
    >>> n.__dict__ = {"a":"b", "c":"d"}
      File "<console>", line 1, in <module>
    ''' <type 'exceptions.AttributeError'> : 'int' object has no
attribute '__dict__' '''
    >>> setattr(n, "__dict__", {"a":"b", "c":"d"})
      File "<console>", line 1, in <module>
    ''' <type 'exceptions.AttributeError'> : 'int' object has no
attribute '__dict__' '''


This leads straight into my second question.  I THINK, without knowing
for sure, that most user classes would pickle correctly by simply
iterating through __dict__.  So, why isn't this the default behavior
for Python?  Was the assumption that programmers would only want to
pickle built-in classes?  Can anyone show me an example of a common
object where my suggested approach would fail?

> What might be easier than fooling around with boxes of sharp knives is to convert
> your ndarray objects

... with their attendant meta-data, of course...

> to Python lists. Lists are pickle-friendly and easy to turn
> back into ndarray objects once they've crossed the pickle boundary.

That's a cute trick.  I may try it, but I would rather do it the way
that Python recommends.  I always prefer to understand what I'm doing,
and why.

> Python will only get better at working with multiple cores/CPUs, but there's plenty of room for improvement on the status quo.

Thank you for agreeing with me.  I'm not formally trained in computer
science, and there was always the chance that my opinion as a non-
professional might come from simply not understanding the issues well
enough.  But I chose Python as a programming language (after Applesoft
Basic, 6502 assembler, Pascal, and C -- yeah, I've been at this a
while) because of its ease of entry.  And while Python has carried me
far, I don't think that an issue that is as central to modern
computing as using multiple CPU's concurrently is so arcane that we
should declare it black magic -- and therefore, it's OK if Python's
philosophical goal of achieving clarity and simplicity is not met in
this case.



More information about the Python-list mailing list