[Python-ideas] solving multi-core Python
abarnert at yahoo.com
Sun Jun 21 23:08:09 CEST 2015
First, a minor question: instead of banning fork entirely within subinterpreters, why not just document that it is illegal to do anything between fork and exec in a subinterpreters, except for a very small (but possibly extensible) subset of Python? For example, after fork, you can no longer access any channels, and you also can't use signals, threads, fork again, imports, assignments to builtins, raising exceptions, or a whole host of other things (but of course if you exec an entirely new Python interpreter, it can do any of those things). C extension modules could just have a flag that marks whether the whole module is fork-safe or not (defaulting to not). So, this allows a subinterpreter to use subprocess (or even multiprocessing, as long as you use the forkserver or spawn mechanism), and it gives code that intentionally wants to do tricky/dangerous things a way to do them, but it avoids all of the problems with accidentally breaking a subinterpreter by forking it and then doing bad things.
Second, a major question: In this proposal, are builtins and the modules map shared, or copied?
If they're copied, it seems like it would be hard to do that even as efficiently as multiprocessing, much less more efficiently. Of course you could fake this with CoW, but I'm not sure how you'd do that, short of CoWing the entire heap (by using clone instead of pthreads on Linux, or by doing a bunch of explicit mmap and related calls on other POSIX systems), at which point you're pretty close to just implementing fork or vfork yourself to avoid calling fork or vfork, and unlikely to get it as efficient or as robust as what's already there.
If they're shared, on the other hand, then it seems like it becomes very difficult to implement subinterpreter-safe code, because it's no longer safe to import a module, set a flag, call a registration function, etc.
More information about the Python-ideas