
On Thursday 18 July 2002 09:50 pm, Guido van Rossum wrote:
I believe I recommended to Greg to make sets "have" a dict instead of "being" dicts, and I think he agreed. But I guess he never got to implementing that change.
Right. OK, guess I'll make a new patch using delegation instead of inheritance, then.
Maybe benchmark the performance too. If the "has" version is much slower, perhaps we could remove unwanted interfaces from the public API by overriding them with something that raises an exception (and rename the internal versions to some internal name if they are needed).
I've just updated patch 580995 with the has-A rather than is-A version. OK, I'll now run some simple benchmarks... Looks good, offhand. Here's the simple benchmark script: import time import set import sys clock = time.clock raw = range(10000) times = [None]*20 print "Timing Set %s (Python %s)" % (set.__version__, sys.version) print "Make 20 10k-items sets (no reps)...", start = clock() for i in times: s10k = set.Set(raw) stend = clock() print stend-start witre = range(1000)*10 print "Make 20 1k-items sets (x10 reps)...", for i in times: s1k1 = set.Set(witre) stend = clock() print stend-start raw1 = range(500, 1500) print "Make 20 more 1k-items sets (no reps)...", for i in times: s1k2 = set.Set(raw1) stend = clock() print stend-start print "20 unions of 1k-items sets 50% overlap...", for i in times: result = s1k1 | s1k2 stend = clock() print stend-start print "20 inters of 1k-items sets 50% overlap...", for i in times: result = s1k1 & s1k2 stend = clock() print stend-start print "20 diffes of 1k-items sets 50% overlap...", for i in times: result = s1k1 - s1k2 stend = clock() print stend-start print "20 simdif of 1k-items sets 50% overlap...", for i in times: result = s1k1 ^ s1k2 stend = clock() print stend-start And here's a few runs (with -O of course) on my PC: [alex@lancelot has]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.2.1 (#2, Jul 15 2002, 17:32:26) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.36 Make 20 more 1k-items sets (no reps)... 0.38 20 unions of 1k-items sets 50% overlap... 0.43 20 inters of 1k-items sets 50% overlap... 0.92 20 diffes of 1k-items sets 50% overlap... 1.41 20 simdif of 1k-items sets 50% overlap... 2.38 [alex@lancelot has]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.2.1 (#2, Jul 15 2002, 17:32:26) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)]) Make 20 10k-items sets (no reps)... 0.22 Make 20 1k-items sets (x10 reps)... 0.37 Make 20 more 1k-items sets (no reps)... 0.39 20 unions of 1k-items sets 50% overlap... 0.44 20 inters of 1k-items sets 50% overlap... 0.93 20 diffes of 1k-items sets 50% overlap... 1.42 20 simdif of 1k-items sets 50% overlap... 2.39 [alex@lancelot has]$ cd ../is [alex@lancelot is]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.2.1 (#2, Jul 15 2002, 17:32:26) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.37 Make 20 more 1k-items sets (no reps)... 0.39 20 unions of 1k-items sets 50% overlap... 0.44 20 inters of 1k-items sets 50% overlap... 0.93 20 diffes of 1k-items sets 50% overlap... 1.42 20 simdif of 1k-items sets 50% overlap... 2.38 [alex@lancelot is]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.2.1 (#2, Jul 15 2002, 17:32:26) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)]) Make 20 10k-items sets (no reps)... 0.22 Make 20 1k-items sets (x10 reps)... 0.38 Make 20 more 1k-items sets (no reps)... 0.4 20 unions of 1k-items sets 50% overlap... 0.44 20 inters of 1k-items sets 50% overlap... 0.93 20 diffes of 1k-items sets 50% overlap... 1.42 20 simdif of 1k-items sets 50% overlap... 2.41 [alex@lancelot is]$ They look much of a muchness to me. Sorry about the version stuck at 1.5 -- forgot to update that, but you can tell the difference by the directory name, 'is' and 'has' resp.:-). Python 2.3 (built from CVS 22 hours ago) is substantially faster at some tasks (intersections and differences): [alex@lancelot has]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.3a0 (#44, Jul 18 2002, 00:03:05) [GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.36 Make 20 more 1k-items sets (no reps)... 0.37 20 unions of 1k-items sets 50% overlap... 0.42 20 inters of 1k-items sets 50% overlap... 0.75 20 diffes of 1k-items sets 50% overlap... 1.08 20 simdif of 1k-items sets 50% overlap... 1.73 [alex@lancelot has]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.3a0 (#44, Jul 18 2002, 00:03:05) [GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.36 Make 20 more 1k-items sets (no reps)... 0.37 20 unions of 1k-items sets 50% overlap... 0.42 20 inters of 1k-items sets 50% overlap... 0.75 20 diffes of 1k-items sets 50% overlap... 1.08 20 simdif of 1k-items sets 50% overlap... 1.74 [alex@lancelot has]$ [alex@lancelot is]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.3a0 (#44, Jul 18 2002, 00:03:05) [GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.35 Make 20 more 1k-items sets (no reps)... 0.37 20 unions of 1k-items sets 50% overlap... 0.41 20 inters of 1k-items sets 50% overlap... 0.74 20 diffes of 1k-items sets 50% overlap... 1.07 20 simdif of 1k-items sets 50% overlap... 1.72 [alex@lancelot is]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.3a0 (#44, Jul 18 2002, 00:03:05) [GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.36 Make 20 more 1k-items sets (no reps)... 0.38 20 unions of 1k-items sets 50% overlap... 0.42 20 inters of 1k-items sets 50% overlap... 0.75 20 diffes of 1k-items sets 50% overlap... 1.08 20 simdif of 1k-items sets 50% overlap... 1.73 [alex@lancelot is]$ but as you can see, again it's uniformly faster on both 'is' and 'has' versions of sets. The 'has' version thus seems preferable here. Alex