[Python-ideas] Copy-on-write when forking a python process
jac
john.theman.connor at gmail.com
Tue Apr 12 23:42:43 CEST 2011
Hi all,
Sorry for cross posting, but I think that this group may actually be
more appropriate for this discussion. Previous thread is at:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/1df510595483b12f
I am wondering if anything can be done about the COW (copy-on-write)
problem when forking a python process. I have found several
discussions of this problem, but I have seen no proposed solutions or
workarounds. My understanding of the problem is that an object's
reference count is stored in the "ob_refcnt" field of the PyObject
structure itself. When a process forks, its memory is initially not
copied. However, if any references to an object are made or destroyed
in the child process, the page in which the objects "ob_refcnt" field
is located in will be copied.
My first thought was the obvious one: make the ob_refcnt field a
pointer into an array of all object refcounts stored elsewhere.
However, I do not think that there would be a way of doing this
without adding a lot of complexity. So my current thinking is that it
should be possible to disable refcounting for an object. This could
be done by adding a field to PyObject named "ob_optout". If ob_optout
is true then py_INCREF and py_DECREF will have no effect on the
object:
from refcount import optin, optout
class Foo: pass
mylist = [Foo() for _ in range(10)]
optout(mylist) # Sets ob_optout to true
for element in mylist:
optout(element) # Sets ob_optout to true
Fork_and_block_while_doing_stuff(mylist)
optin(mylist) # Sets ob_optout to false
for element in mylist:
optin(element) # Sets ob_optout to false
I realize that using shared memory is a possible solution for many of
the situations one would wish to use the above solution, but I think
that there are enough situations where one wishes to use the os's cow
mechanism and is prohibited from doing so to warrant a fix.
Has anyone else looked into the COW problem? Are there workarounds
and/or other plans to fix it? Does the solution I am proposing sound
reasonable, or does it seem like overkill? Does anyone see any
(technical) problems with it?
Thanks,
--jac
More information about the Python-ideas
mailing list