[Python-Dev] object equality vs identity, in and dicts idioms and speed

Martin v. Loewis martin@v.loewis.de
Thu, 3 Jan 2002 23:56:38 +0100


> Now what is the fastest idiom equivalent to:
> 
> obj in list
> 
> when I'm interested in identity (is) and not equality?

It appears that doing a plain for loop is fastest, see the attached
script below. On my system,it gives

m1         0   0.00      5000   0.29      9999   0.60       1.0   0.61
m2         0   0.60      5000   0.61      9999   0.62       1.0   0.62
m3         0   1.81      5000   1.81      9999   1.81       1.0   1.83
m4         0   0.00      5000   1.54      9999   3.11       1.0   3.17


> Although my experience say that the equality case is the most
> common, I wonder whether some directy support for the identity case
> isn't worth, because it is rare but typically then you would like
> some speed.

In Smalltalk, such things would be done in specialized
containers. E.g. the IdentityDictionary is a dictionary where keys are
considered equal only if identical. Likewise, you could have a
specialized list type. OTOH, if you need speed, just write an
extension module - doing a identical_in function is straight-forward.

I'd hesitate to add identical_in to the API, since it would mean that
it needs to be supported for any container, the same sq_contains works
now.

Regards,
Martin

import time

x = range(10000)
rep = [None] * 100

values = x[0], x[5000], x[-1], 1.0

def m1(val, rep=rep, x=x):
    for r in rep:
        found = 0
        for s in x:
            if s is val:
                found = 1
                break

def m2(val, rep=rep, x=x):
    for r in rep:
        found = [s for s in x if s is val]

def m3(val, rep=rep, x=x):
    for r in rep:
        def identical(elem):
            return elem is val
        found = filter(identical, x)

class Contains:
    def __init__(self, val):
        self.val = val
    def __eq__(self, other):
        return self.val is other
    
def m4(val, rep=rep, x=x):
    for r in rep:
        found = Contains(val) in x

for options in [m1, m2, m3, m4]:
    print options.__name__,
    for val in values:
        start = time.time()
        options(val)
        end = time.time()
        print "%9s %6.2f" % (val,end-start),
    print