On 07/11/2011 04:21 PM William ML Leslie wrote:
On 11 July 2011 23:21, Bengt Richter<bokr@oz.net> wrote:
On 07/11/2011 01:36 PM William ML Leslie wrote:
On 11 July 2011 20:29, Bengt Richter<bokr@oz.net> wrote:
On 07/10/2011 09:13 PM Laura Creighton wrote:
What do we want to happen when somebody -- say in a C extension -- takes the id of an object that is scheduled to be removed when the gc next runs?
IMO taking the id should increment the object ref counter and prevent the garbage collection, until the id value itself is garbage collected.
This significantly changes the meaning of id() in a way that will break existing code.
Do you have an example of existing code that depends on the integer-cast value of a dangling pointer??
I mean that id creating a reference will break existing code. id() has always returned an integer, and the existence of some integer in some python code has never prevented some otherwise unrelated object from being collected. Existing code will not make sure that it cleans up the return value of id(), as nowhere has id() ever kept a reference to the object passed in. Ok, d'oh ;-/
I was focused on making sure the id value "referred" to an existing live object *when returned from id* (it is of course live when passed to id while bound in id's argument -- but if that is the *only* binding, then the object is *guaranteed* to be garbage when id returns the integer, and thus that integer is IMO meaningless except as a debugging peek at implementation, and it would be an *error* for a program to depend on its value. [10:12 ~]$ python -c 'import this'|grep -A1 Errors Errors should never pass silently. Unless explicitly silenced. You are right that existing code could and some probably would break if id guarantees validity of the integer by holding the object, so I will go with the first alternative I mentioned in my reply to Armin, and focus on preventing return of the id of garbage rather than the "or else..." option which is impractical and is likely to break code, as you say. <excerpt pasted as quote>
Letting the expression result die and returning a kind of pointer to where the result object *was* seems like a dangling pointer problem, except I guess you can't dereference an id value (without hackery).
Maybe id should raise an exception if the argument referenced only has a ref count of 1 (i.e., just the reference from the argument list)?
Or else let id be a class and return a minimal instance only binding the passed object, and customize the compare ops to take into account type diffs etc.? Then there would be no id values without corresponding objects, and id values used in expressions would die a natural death, along with their references to their objects -- whether "variables" or expressions.
Sorry to belabor the obvious ;-) </excerpt>
Rather than exception, perhaps returning a None would suffice, analogous to a null pointer where no valid pointer can be returned. That should be cheap. It could also be used in answer to Laura's question, to which I only proposed the impractical id object.
I know that you are suggesting that id returns something that is /not/ an integer, but that is also a language change. People have always been able to assume that they can % format ids as decimals or hexadecimals.
I thought of subclassing int, but was reaching for an id abstraction more than a practical thing, sorry. But never mind to id-as-object idea for current python ;-)
Or do you mean that id's must be allowed to be compared == to integers, which my example prohibits? (I didn't define __cmp__, BTW, just lazy ;-)
Good, __cmp__ has been deprecated for over 10 years now.
The only sensible sort on id's I can think of off hand would be if id's carried a time stamp.
If you want an object reference, just use one. If you want them to be persistent, build a dictionary from id to object.
Yes, dictionary is one way to bind an object and thus make sure its id is valid.
But it would be overkill to use a dictionary to guarantee object id persistence just for the duration of an expression such as id(x.a) == id(y.a)
But id is not about persistence. The lack of persistence is one of its key features.
That said, I do think id()'s current behaviour is overkill. I just don't think we can change it in a way that will fit existing usage. And cleaning it up properly is far too much work.
How about just returning None when id sees an object which no other code will be able to see when id returns (hence making the integer the id of garbage)? <snip>
The definition of id(), according to docs.python.org, is:
Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
Also, a new id could live alongside the old ;-)
It's just that the problems you are attempting to fix are already solved, and they are only vaguely related to what a python programmer understands id() to mean. If, according to cpython, "1003 is not 1000 + 3", then programmers can't rely on any excellent new behaviour for id() *anyway*. My question to Armin was whether doing what cpython 2.7 does meant following
Hm, I couldn't find that, googling <a few strings from the above> site:python.org Nor at site:docs.python.org. Maybe from a non-current version of docs? But never mind. the vagaries of possible optimizations. E.g., if space for constants were slightly modified, cpython would return False for "1003 is not 1000 +3". 1000+3 is apparently already folded to a constant 1003, but apparently local constants are currently allowed to be duplicated, as you see in in the disassembly of your example:
from ut.miscutil import disev 1003 is not 1000 + 3 True disev("1003 is not 1000 + 3") 1 0 LOAD_CONST 0 (1003) 3 LOAD_CONST 3 (1003) 6 COMPARE_OP 9 (is not) 9 RETURN_VALUE
It would seem you could generate quite a few equivalent constants:
disev('[1000+3,1000+3,1000+3,1000+3,1000+3]') 1 0 LOAD_CONST 2 (1003) 3 LOAD_CONST 3 (1003) 6 LOAD_CONST 4 (1003) 9 LOAD_CONST 5 (1003) 12 LOAD_CONST 6 (1003) 15 BUILD_LIST 5 18 RETURN_VALUE which sooner or later someone will probably find a reason to optimize for space, and what does that mean for the *"language"* definition of id?
OTOH, the "identity may not even be preserved for primitive types" issue is an observable difference to cpython and is fixable, even if it is a silly thing to rely on.
Apparently the folding of expressions yielding e.g. small integers involves generating a reference to the single instance. Hm. I downloaded pypy and it does optimize constant storage for 1003 is 1000+3 [11:03 ~]$ pypy pypy: /usr/lib/libcrypto.so.0.9.8: no version information available (required by pypy) pypy: /usr/lib/libssl.so.0.9.8: no version information available (required by pypy) Python 2.7.1 (b590cf6de419, Apr 30 2011, 02:00:38) [PyPy 1.5.0-alpha0 with GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. And now for something completely different: ``psyco eats one brain per inch of progress''
1003 is 1000+3
True
from ut.miscutil import disev disev('1003 is 1000+3') 1 0 LOAD_CONST 0 (1003) 3 LOAD_CONST 0 (1003) 6 COMPARE_OP 8 (is) 9 RETURN_VALUE
Let's see what the id values are:
id(1003), id(1000+3) (-1216202084, -1216202084) disev('id(1003), id(1000+3)') 1 0 LOAD_NAME 0 (id) 3 LOAD_CONST 0 (1003) 6 CALL_FUNCTION 1 9 LOAD_NAME 0 (id) 12 LOAD_CONST 0 (1003) 15 CALL_FUNCTION 1 18 BUILD_TUPLE 2 21 RETURN_VALUE
Vs cpython 2.7.2:
id(1003), id(1000+3) # different garbage ;-) (136814932, 136814848) disev('id(1003), id(1000+3) # different garbage ;-)') 1 0 LOAD_NAME 0 (id) 3 LOAD_CONST 0 (1003) 6 CALL_FUNCTION 1 9 LOAD_NAME 0 (id) 12 LOAD_CONST 3 (1003) 15 CALL_FUNCTION 1 18 BUILD_TUPLE 2 21 RETURN_VALUE
Of course, the id's are all still id's of garbage locations once returned from id ;-) So how about returning None instead of id's of garbage, or raising an exception? Would that not be pythonic? Regards, Bengt Richter