[Patches] [ python-Patches-1121611 ] sha and md5 modules should use OpenSSL when possible

SourceForge.net noreply at sourceforge.net
Sun Jun 12 22:35:52 CEST 2005


Patches item #1121611, was opened at 2005-02-12 20:33
Message generated for change (Comment added) made by tjreedy
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1121611&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Gregory P. Smith (greg)
Assigned to: Gregory P. Smith (greg)
Summary: sha and md5 modules should use OpenSSL when possible

Initial Comment:
The md5 and sha (sha1) modules should use OpenSSL for
the algorithms when it is available as its
implementations are much faster than pythons own.

Attached is an initial patch to use OpenSSL for the sha
module.  Its not ready for committing as is yet, but it is
setup to be a generic base for all OpenSSL hashes with
a little bit of work in the future.  Tossing this out
there for people to see how trivial it is and enjoy the
speedups.

diff is against HEAD but it should apply to 2.4 just fine.


----------------------------------------------------------------------

>Comment By: Terry J. Reedy (tjreedy)
Date: 2005-06-12 16:35

Message:
Logged In: YES 
user_id=593130

Re Doc page: As a somewhat naive (relative to the subject) 
reader, the title and first sentence implied that 'secure hash' 
and 'message digest' are two separate things, whereas, judging 
from the .digest() blurb, they both seem to be16-byte hashes.  
So I would prefer this equivalence and the actual meaning were 
made clear at the top.  Something like "This module implements a 
common interface to several secure hash or message digest 
algorithms that produce 16-byte hashes."

If, as I presume, xx.hexdigest() == binascii.hexlify(xx.digest()), 
then I would say so and reference binsacii for the 
interconversion functions one would need if one had the two 
versions to compare or needed to convert after the extraction.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2005-06-12 08:18

Message:
Logged In: YES 
user_id=4771

On a side note, maybe it makes sense for a new module like this one to promote and use the modern (>=2.2) ways of defining C types.

What I have in mind is using tp_methods instead of Py_FindMethod, and generally not reverting to strcmp().  In this case, the constants like 'digest_size' would be best stored as class attributes instead, if possible.  Indeed, allowing expressions like "hashlib.md5.digest_size" conveys the idea that the result doesn't depend on a particular instance, unlike "hashlib.md5().digest_size".  (Of course class attributes are also readable from the instance, as usual.)

I can give it a try if you don't want to invest more time in this patch than you already did (for which we are grateful to you :-)

----------------------------------------------------------------------

Comment By: Gregory P. Smith (greg)
Date: 2005-06-11 23:21

Message:
Logged In: YES 
user_id=413

Ok, this patch is ready.  documentation has been added. 
I'll bring it up on python-dev for discussion/approval with
a link to the htmlified documentation.

The speedups are great for any application hashing a lot of
data when OpenSSL is used.  It also adds a sha224, sha256,
sha384 and sha512 support.

----------------------------------------------------------------------

Comment By: Gregory P. Smith (greg)
Date: 2005-03-12 20:13

Message:
Logged In: YES 
user_id=413

I linked a _hashlib.so library statically against openssl
and reran the speed test.  no change.  that means its not
shared library overhead causing the higher startup time but
just an artifact of the OpenSSL EVP interface.

Next up, analyze what size things common heavy sha1 using
applications regularly hash (BitTorrent and such).

----------------------------------------------------------------------

Comment By: Gregory P. Smith (greg)
Date: 2005-03-10 03:09

Message:
Logged In: YES 
user_id=413

The 007 patch improves the speed of the constructor.  There
is still a potential speed issue with the
constructor/destructor to work on:

greg at spiff src $ ./python Lib/test/test_hashlib_speed.py _sha
testing speed of old _sha legacy interface
0.06 seconds [20000 creations]
0.24 seconds [20000 "" digests]
0.15 seconds 20 x 106201 bytes [huge data]
0.15 seconds 200 x 10620 bytes [large data]
0.17 seconds 2000 x 1062 bytes [medium data]
0.35 seconds 20020 x 106 bytes [small data]
1.37 seconds 106200 x 20 bytes [digest_size data]
2.75 seconds 212400 x 10 bytes [tiny data]
greg at spiff src $ ./python Lib/test/test_hashlib_speed.py sha1
testing speed of hashlib.sha1 <built-in function openssl_sha1>
0.22 seconds [20000 creations]
0.57 seconds [20000 "" digests]
0.09 seconds 20 x 106201 bytes [huge data]
0.09 seconds 200 x 10620 bytes [large data]
0.15 seconds 2000 x 1062 bytes [medium data]
0.71 seconds 20020 x 106 bytes [small data]
3.39 seconds 106200 x 20 bytes [digest_size data]
6.70 seconds 212400 x 10 bytes [tiny data]

I suspect the cause is either or both of the shared openssl
library call overhead or the openssl EVP abstraction
interface.  The speed results are very similar to the above
regardless of which digest is used (the above was a celeron
333mhz running linux).


----------------------------------------------------------------------

Comment By: Gregory P. Smith (greg)
Date: 2005-03-03 16:15

Message:
Logged In: YES 
user_id=413

hashlib-006.patch adds fast constructors and a speed test. 
documentation is the next step.

----------------------------------------------------------------------

Comment By: Gregory P. Smith (greg)
Date: 2005-03-01 04:14

Message:
Logged In: YES 
user_id=413

hashlib-005.patch now passes its test suite and no problems
appear in valgrind.

----------------------------------------------------------------------

Comment By: Gregory P. Smith (greg)
Date: 2005-02-28 13:11

Message:
Logged In: YES 
user_id=413

a much updated patch (hashlib-patch-004.patch).  it
incorporates some suggestions as well as including sf patch
935454's sha256/224 and sha512/384 implementations.

still not complete but shows the direction its going in (i
see a segfault part way thru the test suite after running
the sha512 tests).

as for the private modules being under another package, i
see no reason to do that since there aren't very many (how
does that work for binary modules anyways?).

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2005-02-18 14:21

Message:
Logged In: YES 
user_id=764593

Should the private modules (such as _sha) be placed in a 
crypto package, instead of directly in the 
parent/everything library?

----------------------------------------------------------------------

Comment By: Gregory P. Smith (greg)
Date: 2005-02-17 01:46

Message:
Logged In: YES 
user_id=413

hashes-openssl-002.patch  replaces the sha and md5 modules
with a general hashes module that wraps all hashes that
OpenSSL supports.

note that OpenSSLs implementations are much faster than the
previous python versions as it choses versions optimized for
your particular hardware.

Incase python is compiled without openssl the hashes wrapper
falls back on the old python sha and md5 module implementations.

side note: This may be sufficient for the Debian folks to
work around their random odd licensing issue.  just have
debian python depend on openssl; use this and remove the old
md5 module/code that wouldn't get used anyways.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1121611&group_id=5470


More information about the Patches mailing list