[Python-Dev] object equality vs identity, in and dicts idioms and speed
Samuele Pedroni
Samuele Pedroni" <pedroni@inf.ethz.ch
Fri, 4 Jan 2002 01:50:19 +0100
[Martin v. Loewis]
> > Now what is the fastest idiom equivalent to:
> >
> > obj in list
> >
> > when I'm interested in identity (is) and not equality?
>
> It appears that doing a plain for loop is fastest, see the attached
> script below. On my system,it gives
>
> m1 0 0.00 5000 0.29 9999 0.60 1.0 0.61
> m2 0 0.60 5000 0.61 9999 0.62 1.0 0.62
> m3 0 1.81 5000 1.81 9999 1.81 1.0 1.83
> m4 0 0.00 5000 1.54 9999 3.11 1.0 3.17
Thanks, and, sorry I could have done the measuraments myself
but I supposed that maybe someone should already know.
The result makes also sense, is the version that does less consing
and calling user python functions. But only profiling knows
the truth <wink>.
>
> > Although my experience say that the equality case is the most
> > common, I wonder whether some directy support for the identity case
> > isn't worth, because it is rare but typically then you would like
> > some speed.
>
> In Smalltalk, such things would be done in specialized
> containers. E.g. the IdentityDictionary is a dictionary where keys are
> considered equal only if identical. Likewise, you could have a
> specialized list type. OTOH, if you need speed, just write an
> extension module - doing a identical_in function is straight-forward.
Is not really my code, but yes writing an extension (especially in
jython) would be not too difficult but see below.
> I'd hesitate to add identical_in to the API, since it would mean that
> it needs to be supported for any container, the same sq_contains works
> now.
I see the problem. Implicitly I was asking whether adding builtin-in
identity_list, identity_dict and corresponding weak versions for the dicts
could make sense or is just code bloat.
The context is anygui (www.anygui.org), I'm following it closely
and I try to help with jython/swing issues.
Anygui has code like this in the event handling logic:
source_stack.append(id(source))
try:
...
if not loop and not r.loop \
and id(obj) in source_stack: continue
...
finally:
source_stack.pop()
Now this is a nice idiom and workarounds the identity-list problem,
but mmh ... id is broken under jython (that means
different objects can get the same id :( ) , also in 2.1 final.
It is a long-standing bug
and yes we are about to solve it but there is a trade-off and jython
id will become precise but many times slower wrt to CPython version
(we need to implement a weak-key-table :( ).
An identity_list would make for a portable idiom with comparable
overhead and will give to the identity case somehow the same speed
of the equality case...
And further anygui shows also a possible need for a WeakKeyIdentityDict...
regards.