__lt__ slowing the "in" operator even if not called
Emanuele Aina
emanuele.aina at gmail.com
Thu Jun 15 08:14:31 EDT 2006
Maric Michaud spiegò:
> Le Mercredi 14 Juin 2006 22:57, Emanuele Aina a écrit :
> > Here you can see that even with only the __lt__ method it goes 10x
> > slower, but __lt__ is never called as "Foo" is not printed.
>
> No, that is not what it shows. The only thing it shows is that in operator is
> slow for lists, and as it iterates over the entire list untill it find an
> element, it is much slower when it fails a lot.
Yes, I now that "in" is slow for lists and that a dict is preferable,
but what I'm pointing out is that in one case is much slower without a
reason.
In my test, the one without __lt__ runs in ~500ms, while the one with
the __lt__ method runs in ~5000ms.
Just for comparision I've also tested with an overridden __eq__ and it
shows timings similar to the ones in the second test, but this time it
is expected, as "in" has to call __eq__ for every element in the list.
In the first test python uses the native equality (comparing memory
pointer) and it is fast because it is implemented in C in the
interpreter (no python function call).
In the last case python has to call __eq__ for every element, and a
python function call is obviuosly more expensive than the native
comparision.
But the __lt__ case leaves me wondering why it is slow as the __eq__
case, while using the native comparision.
What I asked is what is the difference between the first and the second
case?
> Use dict in operator instead.
Yes, I know that a dict is the right datastructure for a fast "in", but
I need to keep track of the ordering, so I cannot use a dict.
> Here is a program to show this in details :
>
> import timing
>
> class State(object):
> def __init__(self, value):
> self.v = value
>
> class StateEQ(State):
> def __eq__ (self, other):
> if isinstance(other, State) :
> return self.v == other.v
> else :
> return self.v == other
>
> class StateLT(State) :
> def __lt__ (self, other):
> if isinstance(other, State) :
> return self.v < other.v
> else :
> return self.v < other
>
> class StateLTEQ(StateEQ, StateLT) : pass
>
> def test(state_cls):
> num_elem = 3*10**4
> print
> print state_cls
>
> l = [state_cls(0)]
> for i in xrange(num_elem): l.append(state_cls(i+1))
> print
> print "in operator at the beginning of list:",
> timing.start()
> for i in xrange(300): i in l
> timing.finish()
> print timing.milli()
> print "in operator at the end of list:",
> timing.start()
> for i in xrange(num_elem-300, num_elem): i in l
> timing.finish()
> print timing.milli()
Sorry, but this works only when you override __eq__.
Without it it will alway scan the entire list, so you are comparing the
timings of two different things.
> print
> print "converting to dict :",
> timing.start()
> d = {}.fromkeys(l)
> timing.finish()
> print timing.milli()
> print "in operator for a dict for %s elements:" % (num_elem*2),
> timing.start()
> for i in xrange(num_elem, num_elem*2): i in d
> timing.finish()
> print timing.milli()
As I said, that was only a test demo exploiting my observation, in my
real program I need to keep track of the order in which I add items in
the list, so I can't use a dict.
> for which the output is :
>
> <class '__main__.State'>
>
> in operator at the beginning of list: 1983
> in operator at the end of list: 2011
>
> converting to dict : 82
> in operator for a dict for 60000 elements: 12
>
> <class '__main__.StateLT'>
>
> in operator at the beginning of list: 3866
> in operator at the end of list: 3871
Here python is using the equality comparator from "int", giving those
fast results.
I've substituted the ints with instances of state_cls:
- for i in xrange(300): i in l
+ for i in xrange(300): state_cls(i) in l
- for i in xrange(num_elem-300, num_elem): i in l
+ for i in xrange(num_elem-300, num_elem): state_cls(i) in l
Running these test, in the case of StateLT, now takes ~10000ms and
~9000ms.
[...]
> Every classes that define the __eq__ operator will find quickly the elements
> if they are at the beginning of the list.
> If they are at the end, the in oprerator is slower in these classes because of
> the overhead of function calls (in StateLt there is also an overhead due to
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> internal lookup, IMO).
^^^^^^^^^^^^^^^
Yes, that was my question! :)
But I hoped in a more exaustive answer: why python has to do this
lookup when the __lt__ method is not involved at all?
More information about the Python-list
mailing list