[Patches] [ python-Patches-560379 ] Karatsuba multiplication

Sun, 11 Aug 2002 19:40:40 -0700

Patches item #560379, was opened at 2002-05-24 21:07
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=560379&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Christopher A. Craig (ccraig)
Assigned to: Tim Peters (tim_one)
Summary: Karatsuba multiplication

Initial Comment:
Adds Karatsuba multiplication to Python.  
Patches longobject.c to use Karatsuba multiplication in
place
of gradeschool math.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-08-11 22:40

Message:
Logged In: YES 
user_id=31435

Thanks!  I checked in some code building on this.  Changes 
included:

+ Adjusted whitespace to meet the standard (spaces 
after "if" and "for", flanking binary operators, etc).

+ The refcount fiddling in x_mul caused assorted system 
crashes if KeyboardInterrupt was raised during a multiply.  
Repaired that.

+ More comments and asserts.

+ Removed k_join and built "the answer" piecemeal into the 
result object in k_mul.  This allows to free more chunks of 
memory sooner, reducing highwater mark and the probable 
size of the working set.

Since the cache behavior is quite different now, it would be 
cool if you could run your tuning tests again.  The cutoff 
value is now a #define, KARATSUBA_CUTOFF near the top 
of longobject.c.

Until I can make time for more thorough testing, k_mul isn't 
called by default:  multiplication invokes k_mul if and only if 
an environment variable named KARAT exists (its value is 
irrelevant; just its existence matters).

----------------------------------------------------------------------

Comment By: Christopher A. Craig (ccraig)
Date: 2002-07-09 18:43

Message:
Logged In: YES 
user_id=135050

I've brought the code into compliance with the coding
standards in the PEP7, and added some comments that I
thought were in line with the rest of the file.  If there is
something else you would like me to do, please tell me. 

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-05 17:38

Message:
Logged In: YES 
user_id=6380

Tim thinks this is cool, but the code can use cleanup and 
comments.

Also, let's not add platform specific hacks (Christian can sell 
those as an add-on :-).

----------------------------------------------------------------------

Comment By: Christopher A. Craig (ccraig)
Date: 2002-05-25 19:41

Message:
Logged In: YES 
user_id=135050

I made the needed changes to make to split on the bigger
number (basically chaged to split on bigger number, and
changed all of the places that need to check to see if there
are no bits left), and the new one is a little bit faster,
so I'm uploading it too.

I had been thinking about fixed precision numbers when I
wrote it, so I honestly didn't consider the fact that I
could just shift the smaller number to 0 and throw it
away... :-)

----------------------------------------------------------------------

Comment By: Christopher A. Craig (ccraig)
Date: 2002-05-25 12:16

Message:
Logged In: YES 
user_id=135050

I just uploaded a graph with some sample timings in it.  
Red is a fence of 20. Green is a fence of 40. Blue is a
fence of 60. Black is done with unmodified Python 2.2.1.  

----------------------------------------------------------------------

Comment By: Christopher A. Craig (ccraig)
Date: 2002-05-25 01:53

Message:
Logged In: YES 
user_id=135050

I got 40 from testing.  Basically I generated 250 random
numbers each for a series of sizes between 5 and 2990 bits
long at 15 bit intervals (i.e. the word size), and stored it
in a dictionary.  Then timed 249 multiplies at each size for
a bunch of fence values and used gdchart to make a pretty
graph.   It cerntainly could be optimized better per
compiler/platform, but I don't know how much gain you'ld see.

I split on the smaller number because I guessed it would be
better.  My thought was that if I split on the smaller
number I'm guaranteed to reach the fence, at which point I
can use the gradeschool method at a near linear cost (since
it's O(n*m) and one of those two is at most the fence size).
 If I split on the larger number, I may run into a condition
where the smaller number is less than half the larger, but I
haven't reached the fence yet, and then gradeschool could be
much more expensive.

----------------------------------------------------------------------

Comment By: Christian Tismer (tismer)
Date: 2002-05-24 23:23

Message:
Logged In: YES 
user_id=105700

Hmm, not bad.

Q: You set the split fence at 40. Where does this number
come from? I think this could be optimzed per compiler/platform.

You say that you split based on the smaller number.
Why this? My intuitive guess would certainly be to always split
on the larger number. I just checked my Python implementation
which does this.
Open question: how to handle very small by very long the
best way? Probably the highschool version is better here,
and that might have led you to investigate the smaller one.
I'd say bosh should be checked.

good work! - cheers chris

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=560379&group_id=5470