[Python-Dev] Re: Sets

Alex Martelli aleax@aleax.it
Thu, 18 Jul 2002 22:52:31 +0200


On Thursday 18 July 2002 09:50 pm, Guido van Rossum wrote:
> > > I believe I recommended to Greg to make sets "have" a dict instead of
> > > "being" dicts, and I think he agreed.  But I guess he never got to
> > > implementing that change.
> >
> > Right.  OK, guess I'll make a new patch using delegation instead
> > of inheritance, then.
>
> Maybe benchmark the performance too.  If the "has" version is much
> slower, perhaps we could remove unwanted interfaces from the public
> API by overriding them with something that raises an exception (and
> rename the internal versions to some internal name if they are
> needed).

I've just updated patch 580995 with the has-A rather than is-A version.
OK, I'll now run some simple benchmarks...

Looks good, offhand.  Here's the simple benchmark script:

import time
import set
import sys

clock = time.clock

raw = range(10000)
times = [None]*20

print "Timing Set %s (Python %s)" % (set.__version__, sys.version)

print "Make 20 10k-items sets (no reps)...",
start = clock()
for i in times:
    s10k = set.Set(raw)
stend = clock()
print stend-start

witre = range(1000)*10
print "Make 20 1k-items sets (x10 reps)...",
for i in times:
    s1k1 = set.Set(witre)
stend = clock()
print stend-start

raw1 = range(500, 1500)
print "Make 20 more 1k-items sets (no reps)...",
for i in times:
    s1k2 = set.Set(raw1)
stend = clock()
print stend-start

print "20 unions of 1k-items sets 50% overlap...",
for i in times:
    result = s1k1 | s1k2
stend = clock()
print stend-start

print "20 inters of 1k-items sets 50% overlap...",
for i in times:
    result = s1k1 & s1k2
stend = clock()
print stend-start

print "20 diffes of 1k-items sets 50% overlap...",
for i in times:
    result = s1k1 - s1k2
stend = clock()
print stend-start

print "20 simdif of 1k-items sets 50% overlap...",
for i in times:
    result = s1k1 ^ s1k2
stend = clock()
print stend-start


And here's a few runs (with -O of course) on my PC:


[alex@lancelot has]$ python -O ../bench_set.py
Timing Set $Revision: 1.5 $ (Python 2.2.1 (#2, Jul 15 2002, 17:32:26)
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)])
Make 20 10k-items sets (no reps)... 0.21
Make 20 1k-items sets (x10 reps)... 0.36
Make 20 more 1k-items sets (no reps)... 0.38
20 unions of 1k-items sets 50% overlap... 0.43
20 inters of 1k-items sets 50% overlap... 0.92
20 diffes of 1k-items sets 50% overlap... 1.41
20 simdif of 1k-items sets 50% overlap... 2.38
[alex@lancelot has]$ python -O ../bench_set.py
Timing Set $Revision: 1.5 $ (Python 2.2.1 (#2, Jul 15 2002, 17:32:26)
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)])
Make 20 10k-items sets (no reps)... 0.22
Make 20 1k-items sets (x10 reps)... 0.37
Make 20 more 1k-items sets (no reps)... 0.39
20 unions of 1k-items sets 50% overlap... 0.44
20 inters of 1k-items sets 50% overlap... 0.93
20 diffes of 1k-items sets 50% overlap... 1.42
20 simdif of 1k-items sets 50% overlap... 2.39
[alex@lancelot has]$ cd ../is
[alex@lancelot is]$ python -O ../bench_set.py
Timing Set $Revision: 1.5 $ (Python 2.2.1 (#2, Jul 15 2002, 17:32:26)
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)])
Make 20 10k-items sets (no reps)... 0.21
Make 20 1k-items sets (x10 reps)... 0.37
Make 20 more 1k-items sets (no reps)... 0.39
20 unions of 1k-items sets 50% overlap... 0.44
20 inters of 1k-items sets 50% overlap... 0.93
20 diffes of 1k-items sets 50% overlap... 1.42
20 simdif of 1k-items sets 50% overlap... 2.38
[alex@lancelot is]$ python -O ../bench_set.py
Timing Set $Revision: 1.5 $ (Python 2.2.1 (#2, Jul 15 2002, 17:32:26)
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)])
Make 20 10k-items sets (no reps)... 0.22
Make 20 1k-items sets (x10 reps)... 0.38
Make 20 more 1k-items sets (no reps)... 0.4
20 unions of 1k-items sets 50% overlap... 0.44
20 inters of 1k-items sets 50% overlap... 0.93
20 diffes of 1k-items sets 50% overlap... 1.42
20 simdif of 1k-items sets 50% overlap... 2.41
[alex@lancelot is]$

They look much of a muchness to me.
Sorry about the version stuck at 1.5 -- forgot to update that, but
you can tell the difference by the directory name, 'is' and 'has' resp.:-).

Python 2.3 (built from CVS 22 hours ago) is substantially faster at some
tasks (intersections and differences):
[alex@lancelot has]$ python -O ../bench_set.py
Timing Set $Revision: 1.5 $ (Python 2.3a0 (#44, Jul 18 2002, 00:03:05)
[GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)])
Make 20 10k-items sets (no reps)... 0.21
Make 20 1k-items sets (x10 reps)... 0.36
Make 20 more 1k-items sets (no reps)... 0.37
20 unions of 1k-items sets 50% overlap... 0.42
20 inters of 1k-items sets 50% overlap... 0.75
20 diffes of 1k-items sets 50% overlap... 1.08
20 simdif of 1k-items sets 50% overlap... 1.73
[alex@lancelot has]$ python -O ../bench_set.py
Timing Set $Revision: 1.5 $ (Python 2.3a0 (#44, Jul 18 2002, 00:03:05)
[GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)])
Make 20 10k-items sets (no reps)... 0.21
Make 20 1k-items sets (x10 reps)... 0.36
Make 20 more 1k-items sets (no reps)... 0.37
20 unions of 1k-items sets 50% overlap... 0.42
20 inters of 1k-items sets 50% overlap... 0.75
20 diffes of 1k-items sets 50% overlap... 1.08
20 simdif of 1k-items sets 50% overlap... 1.74
[alex@lancelot has]$
[alex@lancelot is]$ python -O ../bench_set.py
Timing Set $Revision: 1.5 $ (Python 2.3a0 (#44, Jul 18 2002, 00:03:05)
[GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)])
Make 20 10k-items sets (no reps)... 0.21
Make 20 1k-items sets (x10 reps)... 0.35
Make 20 more 1k-items sets (no reps)... 0.37
20 unions of 1k-items sets 50% overlap... 0.41
20 inters of 1k-items sets 50% overlap... 0.74
20 diffes of 1k-items sets 50% overlap... 1.07
20 simdif of 1k-items sets 50% overlap... 1.72
[alex@lancelot is]$ python -O ../bench_set.py
Timing Set $Revision: 1.5 $ (Python 2.3a0 (#44, Jul 18 2002, 00:03:05)
[GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)])
Make 20 10k-items sets (no reps)... 0.21
Make 20 1k-items sets (x10 reps)... 0.36
Make 20 more 1k-items sets (no reps)... 0.38
20 unions of 1k-items sets 50% overlap... 0.42
20 inters of 1k-items sets 50% overlap... 0.75
20 diffes of 1k-items sets 50% overlap... 1.08
20 simdif of 1k-items sets 50% overlap... 1.73
[alex@lancelot is]$

but as you can see, again it's uniformly faster on both 'is' and 'has'
versions of sets.

The 'has' version thus seems preferable here.


Alex