[Python-ideas] Copy-on-write when forking a python process

Wed Apr 13 03:12:52 CEST 2011

On Tue, Apr 12, 2011 at 2:42 PM, jac <john.theman.connor at gmail.com> wrote:
> Hi all,
> Sorry for cross posting, but I think that this group may actually be
> more appropriate for this discussion.  Previous thread is at:
> http://groups.google.com/group/comp.lang.python/browse_thread/thread/1df510595483b12f
>
> I am wondering if anything can be done about the COW (copy-on-write)
> problem when forking a python process.  I have found several
> discussions of this problem, but I have seen no proposed solutions or
> workarounds.  My understanding of the problem is that an object's
> reference count is stored in the "ob_refcnt" field of the PyObject
> structure itself.  When a process forks, its memory is initially not
> copied. However, if any references to an object are made or destroyed
> in the child process, the page in which the objects "ob_refcnt" field
> is located in will be copied.
> My first thought was the obvious one: make the ob_refcnt field a
> pointer into an array of all object refcounts stored elsewhere.
> However, I do not think that there would be a way of doing this
> without adding a lot of complexity.  So my current thinking is that it
> should be possible to disable refcounting for an object.  This could
> be done by adding a field to PyObject named "ob_optout".  If ob_optout
> is true then py_INCREF and py_DECREF will have no effect on the
> object:
>
> from refcount import optin, optout
> class Foo: pass
> mylist = [Foo() for _ in range(10)]
> optout(mylist)  # Sets ob_optout to true
> for element in mylist:
>      optout(element) # Sets ob_optout to true
> Fork_and_block_while_doing_stuff(mylist)
> optin(mylist) # Sets ob_optout to false
> for element in mylist:
>      optin(element) # Sets ob_optout to false
>
> I realize that using shared memory is a possible solution for many of
> the situations one would wish to use the above solution, but I think
> that there are enough situations where one wishes to use the os's cow
> mechanism and is prohibited from doing so to warrant a fix.
>
> Has anyone else looked into the COW problem?  Are there workarounds
> and/or other plans to fix it?  Does the solution I am proposing sound
> reasonable, or does it seem like overkill?  Does anyone see any
> (technical) problems with it?

I do not think most people consider this a problem.  For

Reference counting in the first place... now that is a problem.  We
shouldn't be doing it and instead should use a more modern scalable
form of garbage collection...  Immutable hashable objects in Python
(or is it just strings?) can be interned using the intern() call.
This means they will never be freed.  But I do not believe the current
implementation of interning prevents reference counting, it just adds
them to an internal map of things (ie: one final reference) so they'll
never be freed.

The biggest drawback is one you can experiment with yourself.

Py_INCREF and Py_DECREF are currently very simple.  Adding a special
case means you'd be adding an additional conditional check every time
they are called (regardless of if it is a special magic high reference
count or a new field with a bit set indicating that reference counting
is disabled for a given object).

To find out if it is worth it, try adding code that does that and
running the python benchmarks and see what happens.

I like your idea of the refcount table being stored elsewhere to
improve this particular copy on write issue but I don't really see it
as a problem a lot of people are encountering.  Got data otherwise
(obviously you are running into it... who else?)?  I do not expect
most people to fork() other than using the subprocess module where its
followed by an exec().

-gps