[Python-Dev] negative modular exponentiation

Tue May 4 22:49:38 EDT 2004

Hi Armin, Guido, Tim, everyone,

Thanks for looking at this patch!  I've submitted a few other 
crypto-related patches too.  They're all waiting eagerly for good, loving 
people to adopt them :-)...

They've all got basically the same motivation.  If I explain that, it might 
help people decide whether they're worth it or not:

Rationale for some crypto patches in 2.4
-----------------------------------------
Almost all crypto protocols require the same fast or OS-dependent 
primitives: modular exponentiation, hashing, ciphers, and 
entropy-harvesting.  Everything else can be done in Python.

To get these primitives in Python you could install M2Crypto (needing 
OpenSSL and SWIG) or pycrypto (needing GMP).  But that complicates software 
distribution.  So I was hoping the few missing primitives could be added to 
2.4, enabling fast and portable pure-Python crypto.

I've submitted the below patches.  The first 2 aren't that important:

#923643, long <-> byte-string:  Convenience methods so longs are easier to 
read/write in network protocols:
   n = long(someString, 256)  # str -> long
   s = n.tostring()  # long -> str

#935454, sha256 module:  Since SHA-256 is preferred over SHA-1 for certain 
things these days.

#934711, platform-specific entropy:  An os.urandom() function providing the 
platform equivalent of /dev/urandom (CryptGenRandom on Windows, presumably 
SecureRandom on Java).

#936813, faster modular exponentiation...

The only thing left is block ciphers (AES, maybe 3DES).  Those would be 
easy (just copy from pycrypto) but I think this is courting enough 
controversy at the moment..

Anyways, about modular exponentiation:

On crypto-sized numbers (1000-8000 bits) on a P4, Python mod-exp is ~6-12x 
slower than GMP.  With this patch it's only ~3-4x slower, which is on par 
with another C library I tested (LibTomCrypt).

Mod-exp pretty much determines how long an SSL or IPsec handshake takes, or 
signing/decrypting a PGP message.  These often take a sizeable fraction of 
a second, so they're worth speeding up.

[Tim]
>The Python long implementation has a
>great track record wrt portability and lack of bugs, and gonzo optimizations
>work against that.

Good points.  The speedup is worth it to me, but it certainly adds 
complexity (though these are boring, straightforward optimizations that 
everyone does, and are described in lots of books.  I ain't good enough to 
do gonzo :-)

>For example, did Python really need the Karatsuba
>multiplication gimmick?  I added the patch for it after a lot of reworking,
>but overall I'm not sure it was a net win;

It seems a good speedup to me, particularly if the cutoff is increased - on 
4000-8000 bit numbers it gives around 10-30% faster multiplication, and on 
bigger numbers it's wildly faster.

>  and the current patch in question
>boosts the smallest integer at which Karatsuba triggers to such a large
>value that anyone reaching it would probably be much better off using a GMP
>wrapper anyway.

You'd be better off using GMP at pretty much any value.  But if you want to 
widely distribute a tool or product that uses crypto, it's a pain for every 
user to have to do that.

>One thing I noticed a few times in the patch was neglecting to check C calls
>for error returns.  Even _PyLong_NumBits() can fail!

Darn, thanks.  The 2 problems I see are in the long <-> byte-string code 
(which I left in the longobject.c patch), not the mod-exp code, which I 
went over more carefully.  I'll fix and re-submit the long<->byte-string 
patch..

Trevor