While we're on the subject, I just wanted to go over something about the combination function, defined as n! combo(n,k) = -------- k!(n-k)! You find this guy all over in math. You can use it to derive the kth term in row n of Pascal's [famous] Triangle for example. Citations: http://www.mathforum.com/epigone/math-learn/zulphingyee/ http://www.mathforum.com/epigone/math-learn/snoipumkhee/ But it's wasteful to implement this expression literally, without interpretation. Terms on the bottom inevitably cancel with terms on the top, so you end up "throwing away" some of the multiplications -- so why do 'em in the first place? For example, consider combo(10,4) = 1*2*3*4*5*6*7*8*9*10 -------------------- 1*2*3*4 * 1*2*3*4*5*6 You can use either the k! or the (n-k)! to cancel multiplications in the numerator. One should take one's pick and do so, rather than carry out all multiplications both above and below. Using Guido's little timer guy from his excellent essay at http://www.python.org/doc/essays/list2str.html (scroll down), we can test the relative efficacy if canceling (goodcombo), vis ignoring the opportunity (badcombo):
def goodcombo(n,k): return reduce(mul,range(n,n-k,-1))/reduce(mul,range(1,k+1))
def badcombo(n,k): return (reduce(mul,range(1,n+1))/ (reduce(mul,range(1,k+1))*reduce(mul,range(1,n-k+1))))
import time def timing(f, n, a, b): print f.__name__, r = range(n) t1 = time.clock() for i in r: f(a,b); f(a,b); f(a,b); f(a,b); f(a,b); f(a,b) f(a,b); f(a,b); f(a,b); f(a,b) t2 = time.clock() print round(t2-t1, 3)
timing(goodcombo,500,10,6) goodcomb 0.275 timing(badcombo,500,11,6) badcomb 0.453
I rest my case. You'll frequently find variants of badchoice in Python modules, e.g. here in in Robert Kern's totally awesome and way cool clifford.py at http://starship.python.net/crew/kernr/source/clifford.py def comb(n, k): """\ Returns /n\\ \\k/ comb(n, k) --> PyInt """ def fact(n): return Numeric.multiply.reduce(range(1, n+1)) return fact(n) / (fact(k) * fact(n-k)) Easier to read, maybe, but not faster to run. Speaking of Clifford Algebra, that's the next big thing in geometric algebra (GA) these days and finding a way to bring GA to secondary schools is a front lines curriculum writing challenge. Maybe Python can help, as per this post to a Community Colleges list (one of mine, of yesterday): http://www.mathforum.com/epigone/mathedcc/lorswinbleh Kirby
[Kirby Urner]
While we're on the subject, I just wanted to go over something about the combination function, defined as
n! combo(n,k) = -------- k!(n-k)! ... But it's wasteful to implement this expression literally, ... Using Guido's little timer guy from his excellent essay at http://www.python.org/doc/essays/list2str.html (scroll down), we can test the relative efficacy if canceling (goodcombo), vis ignoring the opportunity (badcombo):
def goodcombo(n,k): return reduce(mul,range(n,n-k,-1))/reduce(mul,range(1,k+1))
def badcombo(n,k): return (reduce(mul,range(1,n+1))/ (reduce(mul,range(1,k+1))*reduce(mul,range(1,n-k+1))))
Oh, Kirby, you have got to get over your unhealthy fascination with reduce <wink>. Seriously, this code is unreadable. Python is supposed to be pretty! Here's a tease: goodcombo 1.506 badcombo 1.659 shortcomb 0.615 "shortcomb" is shown later. No reduce.
I rest my case.
Ditto <wink>.
timing(goodcombo,500,10,6) goodcomb 0.275 timing(badcombo,500,11,6) badcomb 0.453
Given the way goodcombo and badcombo work, passing 10,6 to goodcombo but 11,6 to badcombo gave goodcombo an unfair advantage. Here's the reduce-free shortcomb: def shortcomb(m, n): """m, n -> number of combinations of m items, n at a time. m >= n >= 0 required. Doesn't check its arguments. Assumes nothing will overflow -- tough luck if it does. """ if n > (m >> 1): n = m-n if n == 0: return 1 result = m i = 2 m, n = m-1, n-1 while n: # assert (result * m) % i == 0 result = result * m / i i += 1 n -= 1 m -= 1 return result Timing was done via timing(goodcombo, 5000, 11, 6) timing(badcombo, 5000, 11, 6) timing(shortcomb, 5000, 11, 6) Now in practice I don't use shortcomb either, for reasons you might be able to guess from the snotty comments I put in its docstring <wink>. Here's the version I actually use, and heartily recommend it. It's slower than badcombo! But it complains if you feed it bad arguments, and uses bigint arithmetic so that overflow isn't a concern (which, btw, is the real reason it's slower): def _chop(n): """n -> int if it fits, else long.""" try: return int(n) except OverflowError: return n def comb(m, n): """m, n -> number of combinations of m items, n at a time. m >= n >= 0 required. """ if not m >= n >= 0: raise ValueError("m >= n >= 0 required: " + `m, n`) if n > (m >> 1): n = m-n if n == 0: return 1 result = long(m) i = 2 m, n = m-1, n-1 while n: # assert (result * m) % i == 0 result = result * m / i i += 1 n -= 1 m -= 1 return _chop(result) Even shortcomb is less prone to spurious overflows than goodcombo or badcombo, though, because it divides out each part of the denominator just as soon as it's certain the numerator is divisible by it:
badcombo(52, 7) Traceback (most recent call last): File "<stdin>", line 1, in ? File "timecomb.py", line 60, in badcombo return (reduce(mul,range(1,n+1))/ OverflowError: integer multiplication goodcombo(52, 7) Traceback (most recent call last): File "<stdin>", line 1, in ? File "timecomb.py", line 57, in goodcombo return reduce(mul,range(n,n-k,-1))/reduce(mul,range(1,k+1)) OverflowError: integer multiplication shortcomb(52, 7) 133784560
Oh, Kirby, you have got to get over your unhealthy fascination with reduce <wink>.
It's a phase, like puberty.
Seriously, this code is unreadable. Python is supposed to be pretty! Here's a tease:
goodcombo 1.506 badcombo 1.659 shortcomb 0.615
timing(goodcombo, 5000, 11, 6) timing(badcombo, 5000, 11, 6) timing(shortcomb, 5000, 11, 6)
Oooo, fast machine. Tomorrow I upgrade from this AMD 400 Mhz to a 1.2 Mhz AMD Thunderbird -- will retest for kicks. But then, this is Windows. Don't tell me you're on a 386.
Given the way goodcombo and badcombo work, passing 10,6 to goodcombo but 11,6 to badcombo gave goodcombo an unfair advantage.
That was an error, to stack the deck like that. goodcombo is still faster than badcombo though. Here's another version of goodcombo: def goodcombo(n,k): """ Returns n!/k!(n-k)! where n>=k """ numer = reduce(mul,[1L]+range(n-k+1,n+1)) return numer/reduce(mul,[1L]+range(2,k+1)) Works with long ints, but still slower than yours, and no parameter checking. I see now that ending with that fatso division is less efficient than your "divide as you go" strategy.
timing(goodchoice2,5000,52,7) goodchoice2 7.037 timing(comb,5000,52,7) comb 5.699
comb is well-designed. Like your _chop method too -- great stuff! Thank you Guru Peters. Kirby
[Kirby Urner]
... Oooo, fast machine.
866 Pentium -- a bit behind the times. But I'm running what will eventually become Python 2.2, and it's a bit faster than 2.1.
Tomorrow I upgrade from this AMD 400 Mhz to a 1.2 Mhz AMD Thunderbird --
Cool! Especially if you meant GHz <wink>.
will retest for kicks. But then, this is Windows.
Oh, same here -- Win98SE. It's a pain for development work (& trying to time things on it reliably is almost hopeless), but it keeps me much more sympathetic to my sisters and the majority of Python's new users than I would be otherwise. Now I can dismiss complaints with *empathy* <wink>.
... Here's another version of goodcombo:
def goodcombo(n,k): """ Returns n!/k!(n-k)! where n>=k """ numer = reduce(mul,[1L]+range(n-k+1,n+1)) return numer/reduce(mul,[1L]+range(2,k+1))
Works with long ints, but still slower than yours, and no parameter checking. I see now that ending with that fatso division is less efficient than your "divide as you go" strategy.
That's especially true when working with long ints, and you already know why! Here's a big int: 29837498239837948798274982794283749383279384 Forget computers for a moment: would *you* rather divide that by 7, or by 3489798? Python too. The routine to divide by "a digit" is much simpler than the routine to divide by "a whole bunch of digits". Sophisticated bigint packages don't care so much, but Python's was written for correctness and portability rather than for peak speed. BTW, "a digit" in the bigint internals is actually in range(2**15), so Python is as happy to divide by 27777 as by 7. Go over that boundary, though, and speed takes a hit. The comb() routine first divides by 2, then by 3, etc, so in practice never needs to do "a fancy" division. When you're working with small ints, though, the hardware is doing the division and it doesn't care so much. If you time it, you'll often find that reduce() is slower than an equivalent loop.
... comb is well-designed.
Thanks! It was part of an Industrial Package, so it had to be <wink>.
Like your _chop method too -- great stuff!
Scheme has it all over Python there: it's a PITA that Python has more than one flavor of int! We're slowly moving toward eradicating the user-visible difference. It's slow going, though, because it's hard to do it in a way that's 100% backward compatible; sooner or later I'm afraid we're going to have to break some old code.
PS: good idea, about getting into compression algorithms. I'd like to learn more about all that myself. Could be lotsa fun, and the links to crypto and information theory are promising too. Now for something completely different -- and Because I said I would...
goodcombo 1.506 badcombo 1.659 shortcomb 0.615
Oooo, fast machine. Tomorrow I upgrade from this AMD 400 Mhz to a 1.2 Mhz AMD Thunderbird -- will retest for kicks. ^ G
New machine:
goodcombo 0.667 badcombo 1.053 shortcomb 0.427
Then: goodchoice2 7.037 Now: goodchoice2 1.778 Then: comb 5.699 Now: comb 1.474 Conclusion: the faster chip is faster. Kirby
participants (2)
-
Kirby Urner
-
Tim Peters