object equality vs identity, in and dicts idioms and speed

Hi, [Ok this is maybe more a comp.lang.python thing but ...] If I'm correct dictionaries are based on equality and so the "in" operator. AFAIK if I'm interested in a dictionary working on identity I should wrap my objects ... Now what is the fastest idiom equivalent to: obj in list when I'm interested in identity (is) and not equality? That was the comp.lang.python part, now my impression is that in any case when I'm interested in identity and not equality I have to workaround, that means I will never directly have the performace of the equality idioms. Although my experience say that the equality case is the most common, I wonder whether some directy support for the identity case isn't worth, because it is rare but typically then you would like some speed. [Yes, I have some concrete context but this is long so unless strictly requested ...] Am I missing something? Opinions. regards.

PS: I know that equality for user classes defaults to identity. But I'm obviously interested to the case when equality has been possibly redefined and I still need identity. Thanks. ----- Original Message ----- From: Samuele Pedroni <pedronis@bluewin.ch> To: <python-dev@python.org> Sent: Thursday, January 03, 2002 5:43 PM Subject: [Python-Dev] object equality vs identity, in and dicts idioms and speed
Hi,
[Ok this is maybe more a comp.lang.python thing but ...]
If I'm correct dictionaries are based on equality and so the "in" operator.
AFAIK if I'm interested in a dictionary working on identity I should wrap my objects ...
Now what is the fastest idiom equivalent to:
obj in list
when I'm interested in identity (is) and not equality?
That was the comp.lang.python part, now my impression is that in any case when I'm interested in identity and not equality I have to workaround, that means I will never directly have the performace of the equality idioms. Although my experience say that the equality case is the most common, I wonder whether some directy support for the identity case isn't worth, because it is rare but typically then you would like some speed. [Yes, I have some concrete context but this is long so unless strictly requested ...]
Am I missing something? Opinions.
regards.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev

Now what is the fastest idiom equivalent to:
obj in list
when I'm interested in identity (is) and not equality?
It appears that doing a plain for loop is fastest, see the attached script below. On my system,it gives m1 0 0.00 5000 0.29 9999 0.60 1.0 0.61 m2 0 0.60 5000 0.61 9999 0.62 1.0 0.62 m3 0 1.81 5000 1.81 9999 1.81 1.0 1.83 m4 0 0.00 5000 1.54 9999 3.11 1.0 3.17
Although my experience say that the equality case is the most common, I wonder whether some directy support for the identity case isn't worth, because it is rare but typically then you would like some speed.
In Smalltalk, such things would be done in specialized containers. E.g. the IdentityDictionary is a dictionary where keys are considered equal only if identical. Likewise, you could have a specialized list type. OTOH, if you need speed, just write an extension module - doing a identical_in function is straight-forward. I'd hesitate to add identical_in to the API, since it would mean that it needs to be supported for any container, the same sq_contains works now. Regards, Martin import time x = range(10000) rep = [None] * 100 values = x[0], x[5000], x[-1], 1.0 def m1(val, rep=rep, x=x): for r in rep: found = 0 for s in x: if s is val: found = 1 break def m2(val, rep=rep, x=x): for r in rep: found = [s for s in x if s is val] def m3(val, rep=rep, x=x): for r in rep: def identical(elem): return elem is val found = filter(identical, x) class Contains: def __init__(self, val): self.val = val def __eq__(self, other): return self.val is other def m4(val, rep=rep, x=x): for r in rep: found = Contains(val) in x for options in [m1, m2, m3, m4]: print options.__name__, for val in values: start = time.time() options(val) end = time.time() print "%9s %6.2f" % (val,end-start), print

[Martin v. Loewis]
Now what is the fastest idiom equivalent to:
obj in list
when I'm interested in identity (is) and not equality?
It appears that doing a plain for loop is fastest, see the attached script below. On my system,it gives
m1 0 0.00 5000 0.29 9999 0.60 1.0 0.61 m2 0 0.60 5000 0.61 9999 0.62 1.0 0.62 m3 0 1.81 5000 1.81 9999 1.81 1.0 1.83 m4 0 0.00 5000 1.54 9999 3.11 1.0 3.17
Thanks, and, sorry I could have done the measuraments myself but I supposed that maybe someone should already know. The result makes also sense, is the version that does less consing and calling user python functions. But only profiling knows the truth <wink>.
Although my experience say that the equality case is the most common, I wonder whether some directy support for the identity case isn't worth, because it is rare but typically then you would like some speed.
In Smalltalk, such things would be done in specialized containers. E.g. the IdentityDictionary is a dictionary where keys are considered equal only if identical. Likewise, you could have a specialized list type. OTOH, if you need speed, just write an extension module - doing a identical_in function is straight-forward.
Is not really my code, but yes writing an extension (especially in jython) would be not too difficult but see below.
I'd hesitate to add identical_in to the API, since it would mean that it needs to be supported for any container, the same sq_contains works now.
I see the problem. Implicitly I was asking whether adding builtin-in identity_list, identity_dict and corresponding weak versions for the dicts could make sense or is just code bloat. The context is anygui (www.anygui.org), I'm following it closely and I try to help with jython/swing issues. Anygui has code like this in the event handling logic: source_stack.append(id(source)) try: ... if not loop and not r.loop \ and id(obj) in source_stack: continue ... finally: source_stack.pop() Now this is a nice idiom and workarounds the identity-list problem, but mmh ... id is broken under jython (that means different objects can get the same id :( ) , also in 2.1 final. It is a long-standing bug and yes we are about to solve it but there is a trade-off and jython id will become precise but many times slower wrt to CPython version (we need to implement a weak-key-table :( ). An identity_list would make for a portable idiom with comparable overhead and will give to the identity case somehow the same speed of the equality case... And further anygui shows also a possible need for a WeakKeyIdentityDict... regards.

An identity_list would make for a portable idiom with comparable overhead and will give to the identity case somehow the same speed of the equality case...
And further anygui shows also a possible need for a WeakKeyIdentityDict...
Well, I'd say this is a clear indication that this has to go the path that all library extensions go (or should go): They are implemented in one project, then are used in other projects as well, until finally somebody submits the implementation to the Python core. In the case of anygui, I'd suggest to include different implementations of the identity_list, and any other specialised container you may have: - one implementation for C python that works across all Python versions (in C) - if useful, one implementation for Python 2.2 using type inheritance, in C, alternatively, one implementation in pure Python: class identity_list(list): def __contains__(self, elem): for i in self: if i is elem: return 1 return 0 # need to implement count, index, remove It turns out that this class is as fast in my benchmark than the Python loop over a builtin list, which is not surprising, as it is the same loop. - one implementation in Java for use with Jython. - one implementation in pure Python which works across all Python versions. The configuration mechanics of anygui should then select an appropriate version. Experience will tell which of those implementations are used in practice, and which are of use to other packages. That will eventually give a foundation for including one of them into the core libraries. People tend to invent new containers all the time (and new methods for existing containers), and I believe we have to resist the tempation of including them into the language at first sight. Just make sure that you do *not* put those containers into the location where the Python library will eventually put them, as well; instead if the core provides them, have the configuration mechanics figure to use the builtin type, instead of the anygui-included fallback implementation. Regards, Martin

An identity_list would make for a portable idiom with comparable overhead and will give to the identity case somehow the same speed of the equality case...
And further anygui shows also a possible need for a WeakKeyIdentityDict...
Well, I'd say this is a clear indication that this has to go the path that all library extensions go (or should go): They are implemented in one project, then are used in other projects as well, until finally somebody submits the implementation to the Python core.
In the case of anygui, I'd suggest to include different implementations of the identity_list, and any other specialised container you may have:
- one implementation for C python that works across all Python versions (in C) - if useful, one implementation for Python 2.2 using type inheritance, in C, alternatively, one implementation in pure Python:
class identity_list(list): def __contains__(self, elem): for i in self: if i is elem: return 1 return 0
# need to implement count, index, remove
It turns out that this class is as fast in my benchmark than the Python loop over a builtin list, which is not surprising, as it is the same loop.
- one implementation in Java for use with Jython.
- one implementation in pure Python which works across all Python versions.
The configuration mechanics of anygui should then select an appropriate version.
Experience will tell which of those implementations are used in practice, and which are of use to other packages. That will eventually give a foundation for including one of them into the core libraries. People tend to invent new containers all the time (and new methods for existing containers), and I believe we have to resist the tempation of including them into the language at first sight.
I won't argue about that.
Just make sure that you do *not* put those containers into the location where the Python library will eventually put them, as well; instead if the core provides them, have the configuration mechanics figure to use the builtin type, instead of the anygui-included fallback implementation.
In this case the above "you" is fully undefined. I will archive this discussion for better times when I have spare-cycles. Anygui people is commited to ship just pure python code, and I'm not really a developer for the project, just a jython "consultant". So I will just workaround otherwise, I already knew that, this was mostly a survey, a valuable one and thanks for the answers. My band-width in the near future is for helping with Jython 2.2 and other personal stuff ... Thanks, Samuele.

[Samuele Pedroni]
... but mmh ... id is broken under jython (that means different objects can get the same id :( ) , also in 2.1 final. It is a long-standing bug and yes we are about to solve it but there is a trade-off and jython id will become precise but many times slower wrt to CPython version (we need to implement a weak-key-table :( ).
Mapping what to what? A fine implementation of id() would be to hand each new object a unique Java int from a global counter, incremented once per Python object creation -- or a Java long if any JVM stays up long enough that 32 bits is an issue <wink>.
participants (3)
-
Martin v. Loewis
-
Samuele Pedroni
-
Tim Peters