[Patches] [ python-Patches-936813 ] fast modular exponentiation

Thu Jul 22 10:39:16 CEST 2004

Patches item #936813, was opened at 2004-04-17 01:16
Message generated for change (Comment added) made by trevp
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=936813&group_id=5470

Category: Core (C code)
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Trevor Perrin (trevp)
Assigned to: Nobody/Anonymous (nobody)
Summary: fast modular exponentiation

Initial Comment:

For crypto-sized numbers, Python mod-exp is several
times slower than GMP or OpenSSL (6x or more).  Those
libraries do crazy special-case stuff, + assembly,
platform-specific tuning, and so on.

However, there's some low-hanging fruit: this patch has
a few basic optimizations giving a 2-3x speedup for
numbers in the 1000-8000 bit range (that's what I've
mostly tested; but the patch should improve, or at
least not hurt, everything else):

 - x_mul() is special-cased for squaring, which is
almost twice as fast as general multiplication.

 - x_mul() uses pointers instead of indices for
iteration, giving ~10% speedup (under VC6).

 - the right-to-left square-and-multiply exponentiation
algorithm is replaced with a left-to-right
square-and-multiply, which takes advantage of small bases.

 - when the exponent is above a certain size, "k-ary"
exponentiation is used to reduce the number of
multiplications via precalculation.

 - when the modulus is odd, Montgomery reduction is used.

 - the Karatsuba cutoff seems too low.  For
multiplicands in the range of 500-5000 bits, Karatsuba
slows multiplication by around ~25% (VC6sp4, Intel P4M
1.7 Ghz).  For larger numbers, the benefits of
Karatsuba are less than they could be.

 Currently, the cutoff is 35 digits (525 bits).  I've
tried 70, 140, 280, and 560.  70, 140, and 280 are
roughly the same: they don't slow down small values,
and they have good speedup on large ones.  560 is not
quite as good for large values, but at least it doesn't
hurt small ones.

I know this is platform-dependent, but I think we
should err on the side of making the cutoff too high
and losing some optimization, instead of putting it too
low and slowing things down.  I suggest 70.

A couple misc. things:

 - Negative exponents with a modulus no longer give an
error, when the base is coprime with the modulus. 
Instead, it calculates the multiplicative inverse of
the base with respect to the modulus, using the
extended euclidean algorithm, and exponentiates that.

 Libraries like GMP and LibTomMath work the same way. 
Being able to take inverses mod a number is useful for
cryptography (e.g. RSA, DSA, and Elgamal).

 - The diff includes patch 923643, which supports
converting longs to byte-strings.  Ignore the last few
diff entries, if you don't want this.

 - I haven't looked into harmonizing with int_pow(). 
Something may have to be done.

----------------------------------------------------------------------

>Comment By: Trevor Perrin (trevp)
Date: 2004-07-22 01:39

Message:
Logged In: YES 
user_id=973611

Pragmatics isn't my strong suit... but I get your drift :-).
 I split it into 3 diffs:
 1) x_mul optimizations: (pointers instead of indices,
special-case squaring, changing Karatsuba cutoff)
 2) rewriting long_pow() for left-to-right 5-ary
 3) Montgomery reduction.  This also includes l_invmod(),
since it's necessary for Montgomery.

I've left out the code which exposes l_invmod() to the user
(and associated docs, tests, and intobject changes).  We
could slap that on afterwards or not...

Anyways, these are applied sequentially:
longobject.c + longobject1.diff = longobject1.c
longobject1.c + longobject2.diff = longobject2.c
longobject2.c + longobject2.diff = longobject3.c

Should I open new tracker items for them?

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2004-07-21 12:29

Message:
Logged In: YES 
user_id=31435

Pragmatics are a real problem here, Trevor.  I don't foresee 
being able to make a solid block of sufficient hours to give to 
reviewing this before Python 2.4 is history (which is why I've 
left this patch unassigned, BTW -- I just can't promise to 
make enough time).  So if nobody else can volunteer to 
review it, that alone is likely to leave the patch sitting here 
unapplied.

But there are several independent changes in this patch, and 
it *could* be broken into several smaller patches.  I tossed 
that bait out before, but you didn't bite.  You should <wink>.

----------------------------------------------------------------------

Comment By: Trevor Perrin (trevp)
Date: 2004-07-19 04:00

Message:
Logged In: YES 
user_id=973611

Tim, thanks for the feedback.  I'm uploading a new patch
against CVS latest that fixes those issues, and adds docs
and tests.  Also, I cleaned up the code quite a bit, and got
it properly handling (I hope) all the varied combinations of
ints/longs, positives/negatives/zeros etc..

Unfortunately, Montgomery is the bulk of the speedup:
http://trevp.net/long_pow/

But I could split out the negative exponent handling into a
separate patch, if that would be easier.

Anyways, I'd like to add more tests for the exponentiation
stuff.  Aside from that, I think the patch is complete.  And
more robust than previously, though I still wouldn't trust
it until another person or two gives it a serious
looking-over....

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2004-07-16 20:06

Message:
Logged In: YES 
user_id=31435

Notes after a brief eyeball scan:

Note that the expression

a & 1 == 1

groups as

a & (1 == 1)

in C -- comparisons have higher precedence in C than bit-
fiddling operators.  Stuff like that is usually best resolved by 
explicitly parenthesizing any "impure" expression fiddling with 
bits.  In this case, in a boolean expression plain

a & 1

has the hoped-for effect. and is clearer anyway.

Would be better to use "**" than "^" in comments when 
exponentiation is intended, since "^" means xor in both 
Python and C.

Doc changes are needed, because you're changing visible 
semantics in some cases.

Tests are needed, especially for new semantics.

l_invmod can return NULL for more than one reason, but one 
of its callers ignores this, assuming that all NULL returns are 
due to lack of coprimality.  It's unreasonable to, e.g., replace 
a MemoryError with a complaint about coprimality; this needs 
reworking.  l_invmod should probably set an exception in 
the "not coprime" case.  As is, it's a weird function, 
sometimes setting an exception when it returns NULL, but not 
setting one when coprimality doesn't obtain.  That makes life 
difficult for callers (which isn't apparent in the patch, 
because its callers are currently ignoring this issue).

The Montgomery reduction gimmicks grossly complicate this 
patch -- they're long-winded and hard to follow.  That may 
come with the territory, but it's the only part of the patch 
that made me want to vomit <wink>.  I'd be happier if it 
weren't there, for aesthetic, clarity, and maintainability 
reasons.   How much of a speedup does it actually buy?

You're right that int pow must deliver the same results as 
long pow, so code is needed for that too.  "short int" 
versus "unbounded int" is increasingly meant to be an invisible 
internal implementation detail in Python.  I'm also in favor of 
giving this meaning to modular negative exponents, btw, so 
no problem with that.  An easy way would be to change int 
pow to delegate to long pow when this is needed.

Pragmatics:  there's a better chance of making 2.4 if the 
patch were done in bite-size stages.  For example, no doc 
changes are needed to switch to 5-ary left-to-right 
exponentation, and that has no effect on the int 
implementation either, etc.  A patch that did just that much 
probably would have gone in a long time ago.

----------------------------------------------------------------------

Comment By: Trevor Perrin (trevp)
Date: 2004-07-13 01:04

Message:
Logged In: YES 
user_id=973611

Uploading 2nd version of longobject.diff - the only change
is that patch 923643 is removed from this diff.  That was a
diff for converting longs to byte-strings, which I
unnecessarily left in.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=936813&group_id=5470