Mailman 3 weakrefs - Python-ideas

newer
Run attached Python tests in...

weakrefs

older
get method for sets?

Ethan Furman

17 May 2012 17 May '12

5:10 p.m.

From the manual [8.11]:

...

A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else.

This leads to a difference in behaviour between CPython and the other implementations: CPython will (currently) immediately destroy any objects that only have weak references to them with the result that trying to access said object will require making a new one; other implementations (at least PyPy, and presumably the others that don't use ref-count gc's) can "reach into the grave" and pull back objects that don't have any strong references left. I would like to have the guarantees for weakrefs strengthened such that any weakref'ed object that has no strong references left will return None instead of the object, even if the object has not yet been garbage collected. Without this stronger guarantee programs that are relying on weakrefs to disappear when strong refs are gone end up relying on the gc method instead, with the result that the program behaves differently on different implementations. ~Ethan~

Show replies by date

Antoine Pitrou

17 May 17 May

5:44 p.m.

On Thu, 17 May 2012 08:10:40 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:

...

From the manual [8.11]:

...
A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else.

This leads to a difference in behaviour between CPython and the other implementations: CPython will (currently) immediately destroy any objects that only have weak references to them with the result that trying to access said object will require making a new one;

This is only true if the object isn't caught in a reference cycle.

...

Without this stronger guarantee programs that are relying on weakrefs to disappear when strong refs are gone end up relying on the gc method instead, with the result that the program behaves differently on different implementations.

Why would they "rely on weakrefs to disappear when strong refs are gone"? What is the use case? Regards Antoine.

Chris Kaynor

7:13 p.m.

On Thu, May 17, 2012 at 8:44 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

On Thu, 17 May 2012 08:10:40 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:

...
From the manual [8.11]:

...
A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else.

This leads to a difference in behaviour between CPython and the other implementations: CPython will (currently) immediately destroy any objects that only have weak references to them with the result that trying to access said object will require making a new one;

This is only true if the object isn't caught in a reference cycle.

To further this, consider the following example, ran in CPython2.6:

...

...
...
import weakref import gc

class O(object): ... pass ... a = O() b = O() a.x = b b.x = a

w = weakref.ref(a)

del a, b

print w() <__main__.O object at 0x0000000003C78B38>

gc.collect() 20

print w() None

...

_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Greg Ewing

18 May 18 May

12:49 a.m.

Ethan Furman wrote:

...

I would like to have the guarantees for weakrefs strengthened such that any weakref'ed object that has no strong references left will return None instead of the object, even if the object has not yet been garbage collected.

Why do you want this guarantee? It would complicate implementations for which ref counting is not the native method of managing memory. -- Greg

stoneleaf

6:08 p.m.

On May 17, 8:10 am, Ethan Furman wrote:

...

From the manual [8.11]:

...
A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else.

This leads to a difference in behaviour between CPython and the other implementations: CPython will (currently) immediately destroy any objects that only have weak references to them with the result that trying to access said object will require making a new one; other implementations (at least PyPy, and presumably the others that don't use ref-count gc's) can "reach into the grave" and pull back objects that don't have any strong references left.

Antione Pitrou wrote:

...

This is only true if the object isn't caught in a reference cycle.

Good point -- so I would also like the proposed change in CPython as well. Ethan Furman wrote:

...

I would like to have the guarantees for weakrefs strengthened such that any weakref'ed object that has no strong references left will return None instead of the object, even if the object has not yet been garbage collected.

Without this stronger guarantee programs that are relying on weakrefs to disappear when strong refs are gone end up relying on the gc method instead, with the result that the program behaves differently on different implementations.

Antione Pitrou wrote:

...

Why would they "rely on weakrefs to disappear when strong refs are gone"? What is the use case?

Greg Ewing wrote:

...

Why do you want this guarantee? It would complicate implementations for which ref counting is not the native method of managing memory.

My dbf module provides direct access to dbf files. A retrieved record is a singleton object, and allows temporary changes that are not written to disk. Whether those changes are seen by the next incarnation depends on (I had thought) whether or not the record with the unwritten changes has gone out of scope. I see two questions that determine whether this change should be made: 1) How difficult it would be for the non-ref counting implementations to implement 2) Whether it's appropriate to have objects be changed, but not saved, and then discarded when the strong references are gone so the next incarnation doesn't see the changes, even if the object hasn't been destroyed yet. ~Ethan~ FYI: For dbf I am going to disallow temporary changes so this won't be an immediate issue for me.

Masklinn

6:38 p.m.

On 2012-05-18, at 18:08 , stoneleaf wrote:

...

My dbf module provides direct access to dbf files. A retrieved record is a singleton object, and allows temporary changes that are not written to disk. Whether those changes are seen by the next incarnation depends on (I had thought) whether or not the record with the unwritten changes has gone out of scope.

If a record is a singleton, that singleton-ification would be handled through weakrefs would it not? In that case, until the GC is triggered (and the weakref is invalidated), you will keep getting your initial singleton and there will be no "next record", I fail to see why that would be an issue.

...

I see two questions that determine whether this change should be made:

1) How difficult it would be for the non-ref counting implementations to implement

Pretty much impossible I'd expect, the weakrefs can only be broken on GC runs (at object deallocation) and that is generally non-deterministic without specifying precisely which type of GC implementation is used. You'd need a fully deterministic deallocation model to ensure a weakref is broken as soon as the corresponding object has no outstanding strong (and soft, in some VMs like the JVM) reference.

...

2) Whether it's appropriate to have objects be changed, but not saved, and then discarded when the strong references are gone so the next incarnation doesn't see the changes, even if the object hasn't been destroyed yet.

If your saves are synchronized with the weakref being broken (the object being *effectively* collected) and the singleton behavior is as well, there will be no difference, I'm not sure what the issue would be, you might just have a second change cycle using the same unsaved (but still modified) object. Although frankly speaking such reliance on non-deterministic events would scare the shit out of me.

stoneleaf

19 May 19 May

4:54 a.m.

On May 18, 9:38 am, Masklinn wrote:

...

On 2012-05-18, at 18:08 , stoneleaf wrote:

...
My dbf module provides direct access to dbf files. A retrieved record is a singleton object, and allows temporary changes that are not written to disk. Whether those changes are seen by the next incarnation depends on (I had thought) whether or not the record with the unwritten changes has gone out of scope.

If a record is a singleton, that singleton-ification would be handled through weakrefs would it not?

Indeed, that is the current bahavior.

...

In that case, until the GC is triggered (and the weakref is invalidated), you will keep getting your initial singleton and there will be no "next record", I fail to see why that would be an issue.

Because, since I had only been using CPython, I was able to count on records that had gone out of scope disappearing along with their _temporary_ changes. If I get that same record back the next time I loop through the table -- well, then the changes weren't temporary, were they?

...

...
I see two questions that determine whether this change should be made:

...
1) How difficult it would be for the non-ref counting implementations to implement

Pretty much impossible I'd expect, the weakrefs can only be broken on GC runs (at object deallocation) and that is generally non-deterministic without specifying precisely which type of GC implementation is used. You'd need a fully deterministic deallocation model to ensure a weakref is broken as soon as the corresponding object has no outstanding strong (and soft, in some VMs like the JVM) reference.

...
2) Whether it's appropriate to have objects be changed, but not saved, and then discarded when the strong references are gone so the next incarnation doesn't see the changes, even if the object hasn't been destroyed yet.

If your saves are synchronized with the weakref being broken (the object being *effectively* collected) and the singleton behavior is as well, there will be no difference, I'm not sure what the issue would be, you might just have a second change cycle using the same unsaved (but still modified) object.

And that's exactly the problem -- I don't want to see the modifications the second time 'round, and if I can't count on weakrefs invalidating as soon as the strong refs are gone I'll have to completely rethink how I handle records from the table.

...

Although frankly speaking such reliance on non-deterministic events would scare the shit out of me.

Indeed -- I hadn't realized that I was until somebody using PyPy noticed the problem. ~Ethan~

Michael Foord

2:33 p.m.

On 19 May 2012 03:54, stoneleaf <ethan@stoneleaf.us> wrote:

...

On May 18, 9:38 am, Masklinn wrote:

...
On 2012-05-18, at 18:08 , stoneleaf wrote:

...
My dbf module provides direct access to dbf files. A retrieved record is a singleton object, and allows temporary changes that are not written to disk. Whether those changes are seen by the next incarnation depends on (I had thought) whether or not the record with the unwritten changes has gone out of scope.

If a record is a singleton, that singleton-ification would be handled through weakrefs would it not?

Indeed, that is the current bahavior.

...
In that case, until the GC is triggered (and the weakref is invalidated), you will keep getting your initial singleton and there will be no "next record", I fail to see why that would be an issue.

Because, since I had only been using CPython, I was able to count on records that had gone out of scope disappearing along with their _temporary_ changes. If I get that same record back the next time I loop through the table -- well, then the changes weren't temporary, were they?

So you're taking a *dependence* on the reference counting garbage collection of the CPython implementation, and when that doesn't work for you with other implementations trying to force the same semantics on them. Your proposal can't reasonably be implemented by other implementations as working out whether there are any references to an object is an expensive operation. A much better technique would be for you to use explicit life-cycle-management (like the with statement) for your objects. Michael

...

...
...
I see two questions that determine whether this change should be made:

...
1) How difficult it would be for the non-ref counting implementations to implement

Pretty much impossible I'd expect, the weakrefs can only be broken on GC runs (at object deallocation) and that is generally non-deterministic without specifying precisely which type of GC implementation is used. You'd need a fully deterministic deallocation model to ensure a weakref is broken as soon as the corresponding object has no outstanding strong (and soft, in some VMs like the JVM) reference.

...
2) Whether it's appropriate to have objects be changed, but not saved, and then discarded when the strong references are gone so the next incarnation doesn't see the changes, even if the object hasn't been destroyed yet.

If your saves are synchronized with the weakref being broken (the object being *effectively* collected) and the singleton behavior is as well, there will be no difference, I'm not sure what the issue would be, you might just have a second change cycle using the same unsaved (but still modified) object.

And that's exactly the problem -- I don't want to see the modifications the second time 'round, and if I can't count on weakrefs invalidating as soon as the strong refs are gone I'll have to completely rethink how I handle records from the table.

...
Although frankly speaking such reliance on non-deterministic events would scare the shit out of me.

Indeed -- I hadn't realized that I was until somebody using PyPy noticed the problem.

~Ethan~ _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

stoneleaf

5:29 p.m.

On May 19, 5:33 am, Michael Foord wrote:

...

So you're taking a *dependence* on the reference counting garbage collection of the CPython implementation, and when that doesn't work for you with other implementations trying to force the same semantics on them.

I am not trying to force anything. I stated what I would like, and followed up with questions to further the discussion.

...

Your proposal can't reasonably be implemented by other implementations as working out whether there are any references to an object is an expensive operation.

Then that nixes it. The (debatable) advantages aren't worth a large expenditure in programmer time, nor a large hit in performance.

...

A much better technique would be for you to use explicit life-cycle-management (like the with statement) for your objects.

I'm leaning strongly towards just not allowing temporary changes, which will also solve my problem. Thanks everyone for the feedback. ~Ethan~

4370

Age (days ago)

4372

Last active (days ago)

List overview

Download

8 comments

7 participants

participants (7)

Antoine Pitrou
Chris Kaynor
Ethan Furman
Greg Ewing
Masklinn
Michael Foord
stoneleaf

weakrefs

Chris Kaynor

tags

participants (7)