[Python-Dev] making python's c iterators picklable (http://bugs.python.org/issue14288)

Kristján Valur Jónsson kristjan at ccpgames.com
Tue Mar 13 23:53:42 CET 2012


Raymond suggested that this patch should be discussed here, so here goes:

How this came about:
There are frameworks, such as the Nagare web framework, (http://www.nagare.org/) that rely on suspending execution at some point and resuming it again.  Nagare does this using Stackless python, pickles the execution state of a tasklet, and resumes it later (possibly elsewhere).  Other frameworks are doing similar things in cloud computing.  I have seen such presentation at previous PyCons, and they have had to write their own picklers to get around these problems.

The problem is this:  While pickling execution state (frame objects, functions) might be considered exotic, and indeed Stackless has modifications unique to it to do it,  they quickly run into trouble that have nothing to do really with the fact that they are doing such exotic things.
For example, the fact that the very common dictiter is implemented in C and not python, necessitates that special pickle support is done added for that, otherwise only some context can be pickled, (those that are not currently iterating through a dict) and not others.

Now stackless has tried to provide this functionality for many years and indeed has special pickling support for dictiters, listiters, etc. (stuff that has nothing to do with the stacklessness of Stackless, really).  However, (somewhat) recently a lot of the itertools were moved into C.  Suddenly iterators, previously picklable (by merit of being in .py) stopped being that, just because they became C objects.  In addition, a bunch of other iterators started showing up (stringiter, bytesiter).  This started to cause problems.  Suddenly you have to arbitrarily restrict what you can and can't do in code that is using these approaches.  For Stackless, (and Nagare), it was necessary to ban the usage of the _itertools module in web programs.

Instead of adding this to Stackless, and thus widening the gap between stackless and cpython, I think it is a good idea simply to fix this in cpython itself. Note that I also consider this to be of general utility to regular, non-exotic applications:  Why should an application, that is working with a bunch of data, but wants to stop that for a bit, and maybe save it out to disk, have to worry about transforming the data into valid primitive datastructures before doing so?

In my opinion, any objects that have simple and obvious pickle semantics should be picklable.  Iterators are just regular objects with some state.  They are not file pointers or sockets or database cursors.  And again, I argue that if these objects were implemented in .py, they would already be automatically picklable (indeed, itertools.py was).  The detail that some iterators in standard python are implemented in C should not automatically restrict their usage for no particular reason.

The patch is straightforward.  Most of it is tests, in fact. But it does use a few tricks in some places to get around the fact that some of those iterator types are hidden.   We  did try to be complete and find all the c iterators, but it was a year ago that the bulk of this work was done and something might have been added in the meantime.

Anyway, that's my pitch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120313/27352271/attachment.html>

More information about the Python-Dev mailing list