Copy (and/or pickle) generators

Add a function to generator objects to copy the entire state of it: Proposed example code: game1 = complicated_game_type_thing() # Progress the game to the first decision point choices = game1.send(None) # Choose something response = get_a_response(choices) # Copy the game generator game2 = game1.copy() # send the same response to each game x = game1.send(response) y = game2.send(response) # verify the new set of choices is the same assert x == y History: I found this stackoverflow Q&A <https://stackoverflow.com/questions/7180212/why-cant-generators-be-pickled> which among other things linked to an in-depth explanation of why generators could not be pickled <http://peadrop.com/blog/2009/12/29/why-you-cannot-pickle-generators/> and this enhancement request for 2.6 <https://bugs.python.org/issue1092962> on the bugtracker. All the reasons given there are perfectly valid.... but they were also given nearly 10 years ago. It may be time to revisit the issue. I couldn't turn up any previous threads here related to this so I'm throwing it out for discussion. Use case: My work involves Monte Carlo Tree Searches of games, eventually in combination with tensorflow. MCTS involves repeatedly copying the state of a simulation to explore the potential outcomes of various choices in depth. If you're doing a game like Chess or Go, a game state is dead simple to summarize - you have a list of board positions with which pieces they have and whose turn it is. If you're doing complex games that don't have an easily summarized state at any given moment, you start running into problems. Think something along the lines of Magic the Gathering with complex turn sequences between players and effect resolutions being done in certain orders that are dependent on choices made by players, etc. Generators are an ideal way to run these types of simulations but the inability to copy the state of a generator makes it impossible to do this in MCTS. As Python is being increasingly used for data science, this use case will be increasingly common. Being able to copy generators will save a lot of work. Keep in mind, I don't necessarily propose that generators should be fully picklable; there are obviously a number of concerns and problems there. Just being able to duplicate the generator's state within the interpreter would be enough for my use case. Workarounds: The obvious choice is to refactor the simulation as an iterator that stores each state as something that's easily copied/pickled. It's probably possible but it'll require a lot of thought and code for each type of simulation. There's a Python2 package from 2009 called generator_tools <https://pypi.org/project/generator_tools/> that purports to do this. I haven't tried it yet to see if it still works in 2.x and it appears beyond my skill level to port to 3.x. PyPy & Stackless Python apparently support this within certain limits? Thoughts? Washington, DC USA ffaristocrat@gmail.com

The state of a generator is not much more that a single Python stack frame plus an integer indicating where in the bytecode the resume point is. But copying/pickling a stack frame is complicated -- it's not just all the locals but also the try/except stack and the expression evaluation stack. Have a look here: https://github.com/python/cpython/blob/master/Include/frameobject.h. I'm not sure that I want to sign up for making all that stuff copyable (pickling will be an even harder challenge). But perhaps you (and/or another fearless hacker) are interested in trying? Or were you just trying to see if the core dev team has spare cycles to implement this for you? --Guido On Tue, Jun 19, 2018 at 3:56 PM Micheál Keane <ffaristocrat@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

I wanted to sound out a couple things. First, I couldn't find any real discussion about it after 2011 so I had no idea if the reasons it was ruled unfeasible with Python 2 still held nearly 10 years later with Python 3. I was mainly wondering if all the recent asynchronous work had changed things significantly. Apparently not? Secondly, one SO comment had included the suggestion that it be posted to this list - my searching couldn't find it ever having been done so here it is. Finally, another comment made the point that there wasn't a strong use case given for it. With the data science libraries that have sprung up around Python in the intervening years, I believe there now is one. Washington, DC USA ffaristocrat@gmail.com On Wed, Jun 20, 2018 at 12:25 AM, Guido van Rossum <guido@python.org> wrote:

On Wed, Jun 20, 2018 at 6:34 AM Micheál Keane <ffaristocrat@gmail.com> wrote: [..]
First, I couldn't find any real discussion about it after 2011 so I had no idea if the reasons it was ruled unfeasible with Python 2 still held nearly 10 years later with Python 3. I was mainly wondering if all the recent asynchronous work had changed things significantly. Apparently not?
No, as message passing works good enough. The code is certainly more readable when you don't have some "global state pickling" kind of magic.
Finally, another comment made the point that there wasn't a strong use case given for it. With the data science libraries that have sprung up around Python in the intervening years, I believe there now is one.
As Guido has pointed out, pickling generators would require proper pickling of the entire frame stack (otherwise generators that use "global" or "nonlocal" won't unpickle correctly). Ideally we should also pickle thread locals and contextvars. Even if Python supported that, pickling and unpickling generators would be a slow operation, to the point of being impracticable (and JIT-based Python implementations would probably use the slowest path for any code that involves frame pickling). Instead you should try to encapsulate your state in a dedicated object that is easy to pickle and unpickle. Your generators can then work with that state object instead of storing the state implicitly in their local variables (this is similar to your workaround #1 but still allows you to work with generators; just don't use local variables). The state (or parts of it) can be an immutable collection/object, which will make it easier to copy/pass it by reference at any point. While this approach requires more work than just encapsulating the state in generators, in the long run it should make your code simpler and more scalable. Yury

On Wed, 20 Jun 2018 12:15:18 -0400 Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Depends what level of automatic (magic?) correctness you're expecting. A generator is conceptually an iterator expressed in a different syntax. If you define an iterator object, it will probably get pickling for free, yet pickling it won't bother serializing the global variables that are accessed from its __next__() and send() methods. A generator needn't be different: you mainly have to be careful to serialize its module's __name__, so that you can lookup the frame's global dict by module name when the generator is recreated. By contrast, closure variables would be an issue. But a first implementation could simply refuse to pickle generators that access an enclosing local state. Regards Antoine.

This was already posted in the thread, but https://github.com/llllllllll/cloudpickle-generators is just an extension to the standard pickle machinery and is able to support closures, nonlocals, and globals: https://github.com/llllllllll/cloudpickle-generators/blob/master/cloudpickle.... It can even support the exotic case of a generator closing over itself. The state that needs to be serialized for a generator is: 1. the frame's locals 2. the frame's globals 3. the closure cells 4. the lasti of the frame 5. the frame's data stack 6. the frame's block stack 7. the frame's suspended exception* The frame's suspended exception is the exception that is stored when you have code like: try: raise ValueError() except Exception: yield value raise The frame stores the (type, value, traceback) so that it can make the raise statement work after the yield. You need to be careful to check for recursion in the globals and closure because the generator instance may get stored there. You also need to check the locals because the generator instance could be sent back into itself and stored in a local. You also need to check the data stack for recursion because the instance could be sent into itself and then left on the stack between yields, like if you use a yield expression in the middle of a tuple creation like: a = (the_generator_instance, (yield)) Extracting the lasti, data stack, block stack and held exception require a little C, the rest can be pulled from pure Python. On Wed, Jun 20, 2018 at 12:27 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

You might find this useful, either to use directly or as a source of inspiration: https://github.com/llllllllll/cloudpickle-generators -n On Tue, Jun 19, 2018, 15:55 Micheál Keane <ffaristocrat@gmail.com> wrote:

The state of a generator is not much more that a single Python stack frame plus an integer indicating where in the bytecode the resume point is. But copying/pickling a stack frame is complicated -- it's not just all the locals but also the try/except stack and the expression evaluation stack. Have a look here: https://github.com/python/cpython/blob/master/Include/frameobject.h. I'm not sure that I want to sign up for making all that stuff copyable (pickling will be an even harder challenge). But perhaps you (and/or another fearless hacker) are interested in trying? Or were you just trying to see if the core dev team has spare cycles to implement this for you? --Guido On Tue, Jun 19, 2018 at 3:56 PM Micheál Keane <ffaristocrat@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

I wanted to sound out a couple things. First, I couldn't find any real discussion about it after 2011 so I had no idea if the reasons it was ruled unfeasible with Python 2 still held nearly 10 years later with Python 3. I was mainly wondering if all the recent asynchronous work had changed things significantly. Apparently not? Secondly, one SO comment had included the suggestion that it be posted to this list - my searching couldn't find it ever having been done so here it is. Finally, another comment made the point that there wasn't a strong use case given for it. With the data science libraries that have sprung up around Python in the intervening years, I believe there now is one. Washington, DC USA ffaristocrat@gmail.com On Wed, Jun 20, 2018 at 12:25 AM, Guido van Rossum <guido@python.org> wrote:

On Wed, Jun 20, 2018 at 6:34 AM Micheál Keane <ffaristocrat@gmail.com> wrote: [..]
First, I couldn't find any real discussion about it after 2011 so I had no idea if the reasons it was ruled unfeasible with Python 2 still held nearly 10 years later with Python 3. I was mainly wondering if all the recent asynchronous work had changed things significantly. Apparently not?
No, as message passing works good enough. The code is certainly more readable when you don't have some "global state pickling" kind of magic.
Finally, another comment made the point that there wasn't a strong use case given for it. With the data science libraries that have sprung up around Python in the intervening years, I believe there now is one.
As Guido has pointed out, pickling generators would require proper pickling of the entire frame stack (otherwise generators that use "global" or "nonlocal" won't unpickle correctly). Ideally we should also pickle thread locals and contextvars. Even if Python supported that, pickling and unpickling generators would be a slow operation, to the point of being impracticable (and JIT-based Python implementations would probably use the slowest path for any code that involves frame pickling). Instead you should try to encapsulate your state in a dedicated object that is easy to pickle and unpickle. Your generators can then work with that state object instead of storing the state implicitly in their local variables (this is similar to your workaround #1 but still allows you to work with generators; just don't use local variables). The state (or parts of it) can be an immutable collection/object, which will make it easier to copy/pass it by reference at any point. While this approach requires more work than just encapsulating the state in generators, in the long run it should make your code simpler and more scalable. Yury

On Wed, 20 Jun 2018 12:15:18 -0400 Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Depends what level of automatic (magic?) correctness you're expecting. A generator is conceptually an iterator expressed in a different syntax. If you define an iterator object, it will probably get pickling for free, yet pickling it won't bother serializing the global variables that are accessed from its __next__() and send() methods. A generator needn't be different: you mainly have to be careful to serialize its module's __name__, so that you can lookup the frame's global dict by module name when the generator is recreated. By contrast, closure variables would be an issue. But a first implementation could simply refuse to pickle generators that access an enclosing local state. Regards Antoine.

This was already posted in the thread, but https://github.com/llllllllll/cloudpickle-generators is just an extension to the standard pickle machinery and is able to support closures, nonlocals, and globals: https://github.com/llllllllll/cloudpickle-generators/blob/master/cloudpickle.... It can even support the exotic case of a generator closing over itself. The state that needs to be serialized for a generator is: 1. the frame's locals 2. the frame's globals 3. the closure cells 4. the lasti of the frame 5. the frame's data stack 6. the frame's block stack 7. the frame's suspended exception* The frame's suspended exception is the exception that is stored when you have code like: try: raise ValueError() except Exception: yield value raise The frame stores the (type, value, traceback) so that it can make the raise statement work after the yield. You need to be careful to check for recursion in the globals and closure because the generator instance may get stored there. You also need to check the locals because the generator instance could be sent back into itself and stored in a local. You also need to check the data stack for recursion because the instance could be sent into itself and then left on the stack between yields, like if you use a yield expression in the middle of a tuple creation like: a = (the_generator_instance, (yield)) Extracting the lasti, data stack, block stack and held exception require a little C, the rest can be pulled from pure Python. On Wed, Jun 20, 2018 at 12:27 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

You might find this useful, either to use directly or as a source of inspiration: https://github.com/llllllllll/cloudpickle-generators -n On Tue, Jun 19, 2018, 15:55 Micheál Keane <ffaristocrat@gmail.com> wrote:
participants (7)
-
Antoine Pitrou
-
Guido van Rossum
-
Ivan Levkivskyi
-
Joseph Jevnik
-
Micheál Keane
-
Nathaniel Smith
-
Yury Selivanov