From the manual [8.11]:
A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else.
This leads to a difference in behaviour between CPython and the other implementations: CPython will (currently) immediately destroy any objects that only have weak references to them with the result that trying to access said object will require making a new one; other implementations (at least PyPy, and presumably the others that don't use ref-count gc's) can "reach into the grave" and pull back objects that don't have any strong references left. I would like to have the guarantees for weakrefs strengthened such that any weakref'ed object that has no strong references left will return None instead of the object, even if the object has not yet been garbage collected. Without this stronger guarantee programs that are relying on weakrefs to disappear when strong refs are gone end up relying on the gc method instead, with the result that the program behaves differently on different implementations. ~Ethan~
On Thu, 17 May 2012 08:10:40 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:
From the manual [8.11]:
A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else.
This leads to a difference in behaviour between CPython and the other implementations: CPython will (currently) immediately destroy any objects that only have weak references to them with the result that trying to access said object will require making a new one;
This is only true if the object isn't caught in a reference cycle.
Without this stronger guarantee programs that are relying on weakrefs to disappear when strong refs are gone end up relying on the gc method instead, with the result that the program behaves differently on different implementations.
Why would they "rely on weakrefs to disappear when strong refs are gone"? What is the use case? Regards Antoine.
On Thu, May 17, 2012 at 8:44 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Thu, 17 May 2012 08:10:40 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:
From the manual [8.11]:
A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else.
This leads to a difference in behaviour between CPython and the other implementations: CPython will (currently) immediately destroy any objects that only have weak references to them with the result that trying to access said object will require making a new one;
This is only true if the object isn't caught in a reference cycle.
To further this, consider the following example, ran in CPython2.6:
import weakref import gc
class O(object): ... pass ... a = O() b = O() a.x = b b.x = a
w = weakref.ref(a)
del a, b
print w() <__main__.O object at 0x0000000003C78B38>
gc.collect() 20
print w() None
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
Ethan Furman wrote:
I would like to have the guarantees for weakrefs strengthened such that any weakref'ed object that has no strong references left will return None instead of the object, even if the object has not yet been garbage collected.
Why do you want this guarantee? It would complicate implementations for which ref counting is not the native method of managing memory. -- Greg
On May 17, 8:10 am, Ethan Furman wrote:
From the manual [8.11]:
A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else.
This leads to a difference in behaviour between CPython and the other implementations: CPython will (currently) immediately destroy any objects that only have weak references to them with the result that trying to access said object will require making a new one; other implementations (at least PyPy, and presumably the others that don't use ref-count gc's) can "reach into the grave" and pull back objects that don't have any strong references left.
Antione Pitrou wrote:
This is only true if the object isn't caught in a reference cycle.
Good point -- so I would also like the proposed change in CPython as well. Ethan Furman wrote:
I would like to have the guarantees for weakrefs strengthened such that any weakref'ed object that has no strong references left will return None instead of the object, even if the object has not yet been garbage collected.
Without this stronger guarantee programs that are relying on weakrefs to disappear when strong refs are gone end up relying on the gc method instead, with the result that the program behaves differently on different implementations.
Antione Pitrou wrote:
Why would they "rely on weakrefs to disappear when strong refs are gone"? What is the use case?
Greg Ewing wrote:
Why do you want this guarantee? It would complicate implementations for which ref counting is not the native method of managing memory.
My dbf module provides direct access to dbf files. A retrieved record is a singleton object, and allows temporary changes that are not written to disk. Whether those changes are seen by the next incarnation depends on (I had thought) whether or not the record with the unwritten changes has gone out of scope. I see two questions that determine whether this change should be made: 1) How difficult it would be for the non-ref counting implementations to implement 2) Whether it's appropriate to have objects be changed, but not saved, and then discarded when the strong references are gone so the next incarnation doesn't see the changes, even if the object hasn't been destroyed yet. ~Ethan~ FYI: For dbf I am going to disallow temporary changes so this won't be an immediate issue for me.
On 2012-05-18, at 18:08 , stoneleaf wrote:
My dbf module provides direct access to dbf files. A retrieved record is a singleton object, and allows temporary changes that are not written to disk. Whether those changes are seen by the next incarnation depends on (I had thought) whether or not the record with the unwritten changes has gone out of scope.
If a record is a singleton, that singleton-ification would be handled through weakrefs would it not? In that case, until the GC is triggered (and the weakref is invalidated), you will keep getting your initial singleton and there will be no "next record", I fail to see why that would be an issue.
I see two questions that determine whether this change should be made:
1) How difficult it would be for the non-ref counting implementations to implement
Pretty much impossible I'd expect, the weakrefs can only be broken on GC runs (at object deallocation) and that is generally non-deterministic without specifying precisely which type of GC implementation is used. You'd need a fully deterministic deallocation model to ensure a weakref is broken as soon as the corresponding object has no outstanding strong (and soft, in some VMs like the JVM) reference.
2) Whether it's appropriate to have objects be changed, but not saved, and then discarded when the strong references are gone so the next incarnation doesn't see the changes, even if the object hasn't been destroyed yet.
If your saves are synchronized with the weakref being broken (the object being *effectively* collected) and the singleton behavior is as well, there will be no difference, I'm not sure what the issue would be, you might just have a second change cycle using the same unsaved (but still modified) object. Although frankly speaking such reliance on non-deterministic events would scare the shit out of me.
On May 18, 9:38 am, Masklinn wrote:
On 2012-05-18, at 18:08 , stoneleaf wrote:
My dbf module provides direct access to dbf files. A retrieved record is a singleton object, and allows temporary changes that are not written to disk. Whether those changes are seen by the next incarnation depends on (I had thought) whether or not the record with the unwritten changes has gone out of scope.
If a record is a singleton, that singleton-ification would be handled through weakrefs would it not?
Indeed, that is the current bahavior.
In that case, until the GC is triggered (and the weakref is invalidated), you will keep getting your initial singleton and there will be no "next record", I fail to see why that would be an issue.
Because, since I had only been using CPython, I was able to count on records that had gone out of scope disappearing along with their _temporary_ changes. If I get that same record back the next time I loop through the table -- well, then the changes weren't temporary, were they?
I see two questions that determine whether this change should be made:
1) How difficult it would be for the non-ref counting implementations to implement
Pretty much impossible I'd expect, the weakrefs can only be broken on GC runs (at object deallocation) and that is generally non-deterministic without specifying precisely which type of GC implementation is used. You'd need a fully deterministic deallocation model to ensure a weakref is broken as soon as the corresponding object has no outstanding strong (and soft, in some VMs like the JVM) reference.
2) Whether it's appropriate to have objects be changed, but not saved, and then discarded when the strong references are gone so the next incarnation doesn't see the changes, even if the object hasn't been destroyed yet.
If your saves are synchronized with the weakref being broken (the object being *effectively* collected) and the singleton behavior is as well, there will be no difference, I'm not sure what the issue would be, you might just have a second change cycle using the same unsaved (but still modified) object.
And that's exactly the problem -- I don't want to see the modifications the second time 'round, and if I can't count on weakrefs invalidating as soon as the strong refs are gone I'll have to completely rethink how I handle records from the table.
Although frankly speaking such reliance on non-deterministic events would scare the shit out of me.
Indeed -- I hadn't realized that I was until somebody using PyPy noticed the problem. ~Ethan~
On 19 May 2012 03:54, stoneleaf <ethan@stoneleaf.us> wrote:
On May 18, 9:38 am, Masklinn wrote:
On 2012-05-18, at 18:08 , stoneleaf wrote:
My dbf module provides direct access to dbf files. A retrieved record is a singleton object, and allows temporary changes that are not written to disk. Whether those changes are seen by the next incarnation depends on (I had thought) whether or not the record with the unwritten changes has gone out of scope.
If a record is a singleton, that singleton-ification would be handled through weakrefs would it not?
Indeed, that is the current bahavior.
In that case, until the GC is triggered (and the weakref is invalidated), you will keep getting your initial singleton and there will be no "next record", I fail to see why that would be an issue.
Because, since I had only been using CPython, I was able to count on records that had gone out of scope disappearing along with their _temporary_ changes. If I get that same record back the next time I loop through the table -- well, then the changes weren't temporary, were they?
So you're taking a *dependence* on the reference counting garbage collection of the CPython implementation, and when that doesn't work for you with other implementations trying to force the same semantics on them. Your proposal can't reasonably be implemented by other implementations as working out whether there are any references to an object is an expensive operation. A much better technique would be for you to use explicit life-cycle-management (like the with statement) for your objects. Michael
I see two questions that determine whether this change should be made:
1) How difficult it would be for the non-ref counting implementations to implement
Pretty much impossible I'd expect, the weakrefs can only be broken on GC runs (at object deallocation) and that is generally non-deterministic without specifying precisely which type of GC implementation is used. You'd need a fully deterministic deallocation model to ensure a weakref is broken as soon as the corresponding object has no outstanding strong (and soft, in some VMs like the JVM) reference.
2) Whether it's appropriate to have objects be changed, but not saved, and then discarded when the strong references are gone so the next incarnation doesn't see the changes, even if the object hasn't been destroyed yet.
If your saves are synchronized with the weakref being broken (the object being *effectively* collected) and the singleton behavior is as well, there will be no difference, I'm not sure what the issue would be, you might just have a second change cycle using the same unsaved (but still modified) object.
And that's exactly the problem -- I don't want to see the modifications the second time 'round, and if I can't count on weakrefs invalidating as soon as the strong refs are gone I'll have to completely rethink how I handle records from the table.
Although frankly speaking such reliance on non-deterministic events would scare the shit out of me.
Indeed -- I hadn't realized that I was until somebody using PyPy noticed the problem.
~Ethan~ _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html
On May 19, 5:33 am, Michael Foord wrote:
So you're taking a *dependence* on the reference counting garbage collection of the CPython implementation, and when that doesn't work for you with other implementations trying to force the same semantics on them.
I am not trying to force anything. I stated what I would like, and followed up with questions to further the discussion.
Your proposal can't reasonably be implemented by other implementations as working out whether there are any references to an object is an expensive operation.
Then that nixes it. The (debatable) advantages aren't worth a large expenditure in programmer time, nor a large hit in performance.
A much better technique would be for you to use explicit life-cycle-management (like the with statement) for your objects.
I'm leaning strongly towards just not allowing temporary changes, which will also solve my problem. Thanks everyone for the feedback. ~Ethan~
participants (7)
-
Antoine Pitrou
-
Chris Kaynor
-
Ethan Furman
-
Greg Ewing
-
Masklinn
-
Michael Foord
-
stoneleaf