[Python-ideas] Copy-on-write when forking a python process

jac john.theman.connor at gmail.com
Tue Apr 12 23:42:43 CEST 2011


Hi all,
Sorry for cross posting, but I think that this group may actually be
more appropriate for this discussion.  Previous thread is at:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/1df510595483b12f

I am wondering if anything can be done about the COW (copy-on-write)
problem when forking a python process.  I have found several
discussions of this problem, but I have seen no proposed solutions or
workarounds.  My understanding of the problem is that an object's
reference count is stored in the "ob_refcnt" field of the PyObject
structure itself.  When a process forks, its memory is initially not
copied. However, if any references to an object are made or destroyed
in the child process, the page in which the objects "ob_refcnt" field
is located in will be copied.
My first thought was the obvious one: make the ob_refcnt field a
pointer into an array of all object refcounts stored elsewhere.
However, I do not think that there would be a way of doing this
without adding a lot of complexity.  So my current thinking is that it
should be possible to disable refcounting for an object.  This could
be done by adding a field to PyObject named "ob_optout".  If ob_optout
is true then py_INCREF and py_DECREF will have no effect on the
object:

from refcount import optin, optout
class Foo: pass
mylist = [Foo() for _ in range(10)]
optout(mylist)  # Sets ob_optout to true
for element in mylist:
      optout(element) # Sets ob_optout to true
Fork_and_block_while_doing_stuff(mylist)
optin(mylist) # Sets ob_optout to false
for element in mylist:
      optin(element) # Sets ob_optout to false

I realize that using shared memory is a possible solution for many of
the situations one would wish to use the above solution, but I think
that there are enough situations where one wishes to use the os's cow
mechanism and is prohibited from doing so to warrant a fix.

Has anyone else looked into the COW problem?  Are there workarounds
and/or other plans to fix it?  Does the solution I am proposing sound
reasonable, or does it seem like overkill?  Does anyone see any
(technical) problems with it?

Thanks,
--jac



More information about the Python-ideas mailing list