
Hi all, Sorry for cross posting, but I think that this group may actually be more appropriate for this discussion. Previous thread is at: http://groups.google.com/group/comp.lang.python/browse_thread/thread/1df5105... I am wondering if anything can be done about the COW (copy-on-write) problem when forking a python process. I have found several discussions of this problem, but I have seen no proposed solutions or workarounds. My understanding of the problem is that an object's reference count is stored in the "ob_refcnt" field of the PyObject structure itself. When a process forks, its memory is initially not copied. However, if any references to an object are made or destroyed in the child process, the page in which the objects "ob_refcnt" field is located in will be copied. My first thought was the obvious one: make the ob_refcnt field a pointer into an array of all object refcounts stored elsewhere. However, I do not think that there would be a way of doing this without adding a lot of complexity. So my current thinking is that it should be possible to disable refcounting for an object. This could be done by adding a field to PyObject named "ob_optout". If ob_optout is true then py_INCREF and py_DECREF will have no effect on the object: from refcount import optin, optout class Foo: pass mylist = [Foo() for _ in range(10)] optout(mylist) # Sets ob_optout to true for element in mylist: optout(element) # Sets ob_optout to true Fork_and_block_while_doing_stuff(mylist) optin(mylist) # Sets ob_optout to false for element in mylist: optin(element) # Sets ob_optout to false I realize that using shared memory is a possible solution for many of the situations one would wish to use the above solution, but I think that there are enough situations where one wishes to use the os's cow mechanism and is prohibited from doing so to warrant a fix. Has anyone else looked into the COW problem? Are there workarounds and/or other plans to fix it? Does the solution I am proposing sound reasonable, or does it seem like overkill? Does anyone see any (technical) problems with it? Thanks, --jac

On Tue, Apr 12, 2011 at 2:42 PM, jac <john.theman.connor@gmail.com> wrote:
I do not think most people consider this a problem. For Reference counting in the first place... now that is a problem. We shouldn't be doing it and instead should use a more modern scalable form of garbage collection... Immutable hashable objects in Python (or is it just strings?) can be interned using the intern() call. This means they will never be freed. But I do not believe the current implementation of interning prevents reference counting, it just adds them to an internal map of things (ie: one final reference) so they'll never be freed. The biggest drawback is one you can experiment with yourself. Py_INCREF and Py_DECREF are currently very simple. Adding a special case means you'd be adding an additional conditional check every time they are called (regardless of if it is a special magic high reference count or a new field with a bit set indicating that reference counting is disabled for a given object). To find out if it is worth it, try adding code that does that and running the python benchmarks and see what happens. I like your idea of the refcount table being stored elsewhere to improve this particular copy on write issue but I don't really see it as a problem a lot of people are encountering. Got data otherwise (obviously you are running into it... who else?)? I do not expect most people to fork() other than using the subprocess module where its followed by an exec(). -gps

On Tue, Apr 12, 2011 at 9:12 PM, Gregory P. Smith <greg@krypto.org> wrote:
Python interns some strings and small ints. The intern builtin ensures a string is in the former cache and isn't applicable for other objects; Python automatically interns strings that look like identifiers and you should never use the intern function yourself. These optimizations have nothing to do with reference counting and could be applicable under other garbage collection schemes. Reference counting doesn't mean that interned objects can never be freed; are you familiar with the idea of weak references? Reference counting is a pleasantly simple though somewhat outdated scheme. It is not nearly as limiting as I think you imagine it to be. Mike

On 4/12/2011 9:32 PM, Mike Graham wrote:
Python interns some strings and small ints. The intern builtin ensures
intern is deprecated in 2.7 and gone in 3.x.
"Changed in version 2.3: Interned strings are not immortal (like they used to be in Python 2.2 and before); you must keep a reference to the return value of intern() around to benefit from it." -- Terry Jan Reedy

On 4/13/11, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Tue, 12 Apr 2011 23:40:02 -0400 Terry Reedy <tjreedy@udel.edu> wrote:
That's a rather strange sentence, because interned strings *are* immortal (until the interpreter is shutdown).
The purpose of that change (which may no longer be effective; I haven't checked recently) was that they were no longer immortal. If the last reference outside the intern dictionary was removed, then the string was removed from the intern dictionary as well. Intern was a way to de-duplicate, but it didn't (by itself) make anything immortal. -jJ

Le mercredi 13 avril 2011 à 09:47 -0400, Jim Jewett a écrit :
They're de-facto immortal, since the user can't access the intern dictionary to remove these strings. That sentence looks like a very misleading way of explaining an implementation detail and making it look like a user-visible semantic change. Regards Antoine.

On 4/13/2011 10:14 AM, Antoine Pitrou wrote:
Quoted sentence was from 2.7. 3.2 has "Interned strings are not immortal; you must keep a reference to the return value of intern() around to benefit from it." This actually makes sense if true: if user cannot access string, it should go away. But I have no idea. -- Terry Jan Reedy

On Wed, Apr 13, 2011 at 7:42 AM, jac <john.theman.connor@gmail.com> wrote:
There's a clear workaround for the COW problem these days: use PyPy instead of CPython :) Currently that workaround comes at a potentially high cost in compatibility with 3rd party C extensions, but that situation will naturally improve over time. Given that a lot of those compatibility problems arise *because* PyPy doesn't use refcounting natively, it's highly unlikely that there will be any significant tinkering with CPython's own approach. As far as technical problems go, opting out of memory management is a beautiful way to shoot yourself in the foot with memory leaks. All it takes is one optout() without a corresponding optin() and an arbitrary amount of memory may fail to be released. For example, in your own post, any exception in Fork_and_block_while_doing_stuff() means anything referenced directly or indirectly from mylist will be left hanging around in memory until the process terminates. That's a *far* worse problem than being unable to readily share memory between processes. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Apr 15, 2011 at 7:10 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I'm fairly sure one of the PyPy talks at Pycon specifically mentioned the CoW problem as one of the ways PyPy was able to save memory over CPython. The PyPy folks would be the ones to accurately answer questions like that, though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
The answer is probably something along the lines that only the most active parts of the heap get copied and the rest are left alone most of the time. Otherwise copying GCs would be really bad for other reasons too, such as causing a lot of paging and cache invalidation. -- Greg

There's a clear workaround for theCOWproblem these days: use PyPy instead of CPython :)
Thanks for the tip, I haven't looked at pypy in a while, it looks like it has come a long way. I will have to change some of my code around to work with 2.5, but it shouldn't be too bad. As far as I am concerned, if pypy works for this, problem solved. Thanks again, --jac On Apr 13, 1:17 am, Nick Coghlan <ncogh...@gmail.com> wrote:

On Tue, 12 Apr 2011 14:42:43 -0700 (PDT) jac <john.theman.connor@gmail.com> wrote:
This smells like premature optimization to me. You're worried about the kernel copying a few extra pages of user data when you're dealing with a dictionary that's gigabytes in size. Sounds like any possibly memory savings here would be much smaller than those that could come from improving the data encoding. But maybe it's not premature. Do you have measurements that show how much extra swap space is taken up by COW copies caused by changing reference counts in your application? <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/consulting.html Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

In my case, memory almost the entire size of the dictionary is being copied into the child process. But for any specific case there will be several factors involved: The size of a page on the os How many elements of the dictionary are accessed The size of the objects in the dictionary (keys and values) The distribution of the objects in memory. (For small objs you may have more then one on a page, etc.) etc. But I think that it is possible in some cases that *more* memory then the entire dictionary will be copied into the child process' memory. I think that this would happen if each key value pair of the dictionary were to be iterated over, and if the page size of the os was larger then the size of an object, and the objects were arranged in memory such that no two objects were contiguous. --jac On Apr 13, 3:34 am, Mike Meyer <m...@mired.org> wrote:

On 4/12/11, jac <john.theman.connor@gmail.com> wrote:
This also causes some problems in a single process attempting to run on multiple cores, because that change invalidates the cache.
My first thought was the obvious one: make the ob_refcnt field a pointer into an array of all object refcounts stored elsewhere.
Good thought, and probably needed for some types of parallelism. The problem is that it also means that actually using the object will require loading from at least two memory areas -- one to update the reference count, the other for the object itself, which may or may not be changed. For relatively small objects, you would effectively be cutting your cache size in half, in addition to the new calculations. It takes a lot of benefit for that to pay back, and it may be simpler to just go with PyPy and an alternate memory management scheme. -jJ

On Tue, Apr 12, 2011 at 2:42 PM, jac <john.theman.connor@gmail.com> wrote:
I do not think most people consider this a problem. For Reference counting in the first place... now that is a problem. We shouldn't be doing it and instead should use a more modern scalable form of garbage collection... Immutable hashable objects in Python (or is it just strings?) can be interned using the intern() call. This means they will never be freed. But I do not believe the current implementation of interning prevents reference counting, it just adds them to an internal map of things (ie: one final reference) so they'll never be freed. The biggest drawback is one you can experiment with yourself. Py_INCREF and Py_DECREF are currently very simple. Adding a special case means you'd be adding an additional conditional check every time they are called (regardless of if it is a special magic high reference count or a new field with a bit set indicating that reference counting is disabled for a given object). To find out if it is worth it, try adding code that does that and running the python benchmarks and see what happens. I like your idea of the refcount table being stored elsewhere to improve this particular copy on write issue but I don't really see it as a problem a lot of people are encountering. Got data otherwise (obviously you are running into it... who else?)? I do not expect most people to fork() other than using the subprocess module where its followed by an exec(). -gps

On Tue, Apr 12, 2011 at 9:12 PM, Gregory P. Smith <greg@krypto.org> wrote:
Python interns some strings and small ints. The intern builtin ensures a string is in the former cache and isn't applicable for other objects; Python automatically interns strings that look like identifiers and you should never use the intern function yourself. These optimizations have nothing to do with reference counting and could be applicable under other garbage collection schemes. Reference counting doesn't mean that interned objects can never be freed; are you familiar with the idea of weak references? Reference counting is a pleasantly simple though somewhat outdated scheme. It is not nearly as limiting as I think you imagine it to be. Mike

On 4/12/2011 9:32 PM, Mike Graham wrote:
Python interns some strings and small ints. The intern builtin ensures
intern is deprecated in 2.7 and gone in 3.x.
"Changed in version 2.3: Interned strings are not immortal (like they used to be in Python 2.2 and before); you must keep a reference to the return value of intern() around to benefit from it." -- Terry Jan Reedy

On 4/13/11, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Tue, 12 Apr 2011 23:40:02 -0400 Terry Reedy <tjreedy@udel.edu> wrote:
That's a rather strange sentence, because interned strings *are* immortal (until the interpreter is shutdown).
The purpose of that change (which may no longer be effective; I haven't checked recently) was that they were no longer immortal. If the last reference outside the intern dictionary was removed, then the string was removed from the intern dictionary as well. Intern was a way to de-duplicate, but it didn't (by itself) make anything immortal. -jJ

Le mercredi 13 avril 2011 à 09:47 -0400, Jim Jewett a écrit :
They're de-facto immortal, since the user can't access the intern dictionary to remove these strings. That sentence looks like a very misleading way of explaining an implementation detail and making it look like a user-visible semantic change. Regards Antoine.

On 4/13/2011 10:14 AM, Antoine Pitrou wrote:
Quoted sentence was from 2.7. 3.2 has "Interned strings are not immortal; you must keep a reference to the return value of intern() around to benefit from it." This actually makes sense if true: if user cannot access string, it should go away. But I have no idea. -- Terry Jan Reedy

On Wed, Apr 13, 2011 at 7:42 AM, jac <john.theman.connor@gmail.com> wrote:
There's a clear workaround for the COW problem these days: use PyPy instead of CPython :) Currently that workaround comes at a potentially high cost in compatibility with 3rd party C extensions, but that situation will naturally improve over time. Given that a lot of those compatibility problems arise *because* PyPy doesn't use refcounting natively, it's highly unlikely that there will be any significant tinkering with CPython's own approach. As far as technical problems go, opting out of memory management is a beautiful way to shoot yourself in the foot with memory leaks. All it takes is one optout() without a corresponding optin() and an arbitrary amount of memory may fail to be released. For example, in your own post, any exception in Fork_and_block_while_doing_stuff() means anything referenced directly or indirectly from mylist will be left hanging around in memory until the process terminates. That's a *far* worse problem than being unable to readily share memory between processes. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Apr 15, 2011 at 7:10 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I'm fairly sure one of the PyPy talks at Pycon specifically mentioned the CoW problem as one of the ways PyPy was able to save memory over CPython. The PyPy folks would be the ones to accurately answer questions like that, though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
The answer is probably something along the lines that only the most active parts of the heap get copied and the rest are left alone most of the time. Otherwise copying GCs would be really bad for other reasons too, such as causing a lot of paging and cache invalidation. -- Greg

There's a clear workaround for theCOWproblem these days: use PyPy instead of CPython :)
Thanks for the tip, I haven't looked at pypy in a while, it looks like it has come a long way. I will have to change some of my code around to work with 2.5, but it shouldn't be too bad. As far as I am concerned, if pypy works for this, problem solved. Thanks again, --jac On Apr 13, 1:17 am, Nick Coghlan <ncogh...@gmail.com> wrote:

On Tue, 12 Apr 2011 14:42:43 -0700 (PDT) jac <john.theman.connor@gmail.com> wrote:
This smells like premature optimization to me. You're worried about the kernel copying a few extra pages of user data when you're dealing with a dictionary that's gigabytes in size. Sounds like any possibly memory savings here would be much smaller than those that could come from improving the data encoding. But maybe it's not premature. Do you have measurements that show how much extra swap space is taken up by COW copies caused by changing reference counts in your application? <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/consulting.html Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

In my case, memory almost the entire size of the dictionary is being copied into the child process. But for any specific case there will be several factors involved: The size of a page on the os How many elements of the dictionary are accessed The size of the objects in the dictionary (keys and values) The distribution of the objects in memory. (For small objs you may have more then one on a page, etc.) etc. But I think that it is possible in some cases that *more* memory then the entire dictionary will be copied into the child process' memory. I think that this would happen if each key value pair of the dictionary were to be iterated over, and if the page size of the os was larger then the size of an object, and the objects were arranged in memory such that no two objects were contiguous. --jac On Apr 13, 3:34 am, Mike Meyer <m...@mired.org> wrote:

On 4/12/11, jac <john.theman.connor@gmail.com> wrote:
This also causes some problems in a single process attempting to run on multiple cores, because that change invalidates the cache.
My first thought was the obvious one: make the ob_refcnt field a pointer into an array of all object refcounts stored elsewhere.
Good thought, and probably needed for some types of parallelism. The problem is that it also means that actually using the object will require loading from at least two memory areas -- one to update the reference count, the other for the object itself, which may or may not be changed. For relatively small objects, you would effectively be cutting your cache size in half, in addition to the new calculations. It takes a lot of benefit for that to pay back, and it may be simpler to just go with PyPy and an alternate memory management scheme. -jJ
participants (9)
-
Antoine Pitrou
-
Greg Ewing
-
Gregory P. Smith
-
jac
-
Jim Jewett
-
Mike Graham
-
Mike Meyer
-
Nick Coghlan
-
Terry Reedy