How fast can we multiply?

Wed Jul 21 02:54:06 EDT 1999

[Tim]
>> Andrew Kuchling's wrapping gmp for Python is the most useful thing
>> that's ever been done in this area.

[Christian Tismer]
> I see. The gmp makes very much sense. Until now.
>
> On the other hand, if the hard coded "C" basic elements are
> made fine and tiny, they give us nuts and bolts to build
> the real algorithms on top.
>
> We can therefore do what gmp does, not so much slower, but
> readable, in Python. No porting, no nothing, just using.

I play with & investigate bigint algorithms in Python, & I enjoy it.  But
serious work in this area requires serious speed, and GMP doesn't compromise
on that -- from slaved-over assembler to low-level mutable packed storage
limbs, it's doing the right stuff for its task.  Factors of 5 count here
<wink>.

You do "long1 & long2" in Python, and it's handled by an all-purpose bitop
function that redispatches on the operation in each iteration of the inner
loop.  It's "like that" everywhere:  the bigint code was written for
compactness (longobject.c is only twice the size of intobject.c!), and
no-hassle porting to the feeblest machine on earth.  The compromises have
consequences (some good!  that code has been rock solid since the first
release, while it can take a week to track down all the bugfix patches to
the last official GMP release).

> And again I see a win for Python. It is able to show how
> algorithms work, in a very clear way using its clean
> syntax, and you can use the implementation of math directly
> as highschool text. *And* it just works, and is fast.
> Fells very very promising to me.

For educational purposes, sure -- provided you can find enough people
seeking this kind of education <wink>.

[matmult]
>> Large dense matmult isn't very common.  Sing along:

> It is just so easy. Factoring matrix multiplication
> into its submatrices and then save (I think it was one)
> multiplications - it would be a dlight to write in
> Python. Not doing so appears to me as to know
> quicksort, but using bubble sort all the time,
> not just at break-even (10-20).
> But I'm telling nothing new to you.

OTOH, Winograd's refinement of the matmult procedure requires 15 floating
adds, where straight 2x2 matmult requires only 4; float + isn't much cheaper
than float * on many modern processors.  So the win here isn't nearly as
pure as for intmult, and the reduction is only from O(n**3) to O(n**2.81)
operations anyway.  There are reasons few vendors offer this form of matmult
in their libraries <0.5 wink>.

> ...
> As another little challenge:
>
> Tim already mentioned Jurjen N.E. Bos' high precision math
> library. It is in Python's contrib site.
> I have the strong impression that it was not used by anybody
> since a year now. python.org fails also to advertize it.
>
> It could need a little rework, to make use of 1.5.2's features.
> And if someone marries it with a fast multiplication, that
> could make a neat tool.

I use it often; I'm not sure how practical it is <wink>, but it's great fun!
It would benefit a *lot* from a constant-time nbits.  The clearest quick win
is to get rid of all the code that's special-casing for "big" left shift
counts; a patch to remove that restriction went into Python shortly after
Jurjen vanished.

> If I were a student, I would propose my advisor to let me do such
> things as my semester homework. What a delight :-)

It would make a great project for any number-head.  OTOH, Jurjen's approach
was both highly clever and highly idiosyncratic, and the internals of the
code are difficult to understand.  Some of the examples in the docstring no
longer work.  And some of the algorithms are clearly flawed; e.g.,

>>> a = r(1)
>>> b = r(1)
>>> a.mant = 8; a.expon = 0
>>> b.mant = 3; b.expon = 0
>>> a
8.+-1
>>> b
3.+-1
>>> a+b
8.+-4
>>>

That is, the result interval is too small -- a+b *can* be as large as 13.  A
similar thing happens whenever the before-rounding internal mantissa of a
sum is congruent to 3 mod 4.

A good semester project would be to come up with-- and prove
correct --algorithms for {+ - * / sqrt exp log input output} for numbers in
this format (a very compact flavor of variable-precision floating-point
intervals).

unary-minus-is-easy<wink>-ly y'rs  - tim