[New-bugs-announce] [issue21234] __contains__ and friends should check "is" for all elements first

Jurjen N.E. Bos report at bugs.python.org
Tue Apr 15 15:04:23 CEST 2014


New submission from Jurjen N.E. Bos:

It all started when adding an __equals__ method to a self made class.
The effect was that my program slowed down enormously, which came as a surprise.

This is because when doing an operation that does linear search through a container, the interpreter checks all items one by one, first checking identity and then equality.
If the __equals__ method is slow, this is slower than needed.

In python, you effectively get:
class myContainer:
  def __contains__(self, obj):
    for i in range(len(self)):
      if self[i] is obj: return True
      if self[i]==obj: return True
    return False

My proposal is to change this to:
class myContainer:
  def __contains__(self, obj):
    for i in range(len(self)):
      if self[i] is obj: return True
    for i in range(len(self)):
      if self[i]==obj: return True
    return False

The net effect would be approximately:
- if the object is exactly in the container, the speedup is significant.
- if the object is not in the container, there is no difference.
- if an object is in the container that is equal to the object, it will be slower, but not very much.

In the normal use case, this will probably feel to the user as a speed improvement, especially when the __equals__ is slow.

I tried to find cases in which this would change behaviour of programs, but I couldn't find any. If this _does_ change behaviour in some noticeable and unwanted way, let me know! (Documenting would also be a good idea, then.)

The accompanying file gives some timings to show what I mean, e.g.
Time in us to find an exact match (begin, middle, end):
0.042335559340708886 31.610660936887758 62.69573781716389
Time in us to find an equal match (begin, middle, end):
0.3730294263293299 31.421928646805195 63.177373531221896
Time if not found:
63.44531546956001
And now for an object that has no __equal__:
Time in us to find a thing (begin, middle, end):
0.03555453901338268 9.878883646121661 19.656711762284473
Time if not found:
19.676395048315776

(Python 2.7 does something completely different with objects without __equal__, so the test gives quite different results.)

----------
components: Interpreter Core
files: containsDemo.py
messages: 216286
nosy: jneb
priority: normal
severity: normal
status: open
title: __contains__ and friends should check "is" for all elements first
type: performance
versions: Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 3.5
Added file: http://bugs.python.org/file34867/containsDemo.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue21234>
_______________________________________


More information about the New-bugs-announce mailing list