[Python-Dev] license issues with profiler.py and md5.h/md5c.c
abo at minkirri.apana.org.au
Fri Feb 18 05:09:51 CET 2005
On Wed, 2005-02-16 at 22:53 -0800, Gregory P. Smith wrote:
> fyi - i've updated the python sha1/md5 openssl patch. it now replaces
> the entire sha and md5 modules with a generic hashes module that gives
> access to all of the hash algorithms supported by OpenSSL (including
> appropriate legacy interface wrappers and falling back to the old code
> when compiled without openssl).
> I don't quite like the module name 'hashes' that i chose for the
> generic interface (too close to the builtin hash() function). Other
> suggestions on a module name? 'digest' comes to mind.
I just had a quick look, and have these comments (psedo patch review?).
Apologies for the noise on the list...
This patch keeps the current md5c.c, md5module.c files and adds the
following; _hashopenssl.c, hashes.py, md5.py, sha.py.
The old md5 and sha extension modules get replaced by hashes.py, md5.py,
and sha.py python modules that leverage off _hash (openssl) or _md5 and
_sha (no openssl) extension modules.
The new _hash extension module "wraps" the high level openssl EVP
interface, which uses a string parameter to indicate what type of
message digest algorithm to use. The advantage of this is it makes all
openssl supported digests available, and if openssl adds more, we get
them for free. A disadvantage of this is it is an abstraction level
above the actual md5 and sha implementations, and this may add
overheads. These overheads are probably negligible compared to the
actual implementation speedups.
The new _md5 and _sha extension modules are simply re-named versions of
the old md5 and sha modules.
The hashes.py module acts as an import wrapper for _hash, and falls back
to using _md5 and _sha modules if _hash is not available. It provides an
EVP style API (string hash name parameter), that supports only md5 and
sha hashes if openssl is not available.
The new md5.py and sha.py modules simply use hash.py.
The introduction of a "hashes" module with a new API that supports many
different digests (provided openssl is available) is extending Python,
not just "fixing the licenses" of md5 and sha modules.
If all we wanted to do was fix the md5 module, a simpler solution would
be to change the md5c.c API to match openssl's implementation, and make
md5module.c use it, conditionally compiling against md5c.c or linking
against openssl in setup.py. A similar approach could be used for sha,
but would require stripping the sha implementation out of shamodule.c
I am mildly of concerned about the namespace/filespace clutter
introduced by this implementation... it feels unnecessary, as does the
tangled dependencies between them. With openssl, hashes.py duplicates
the functionality of _hash. Without openssl, md5.py and sha.py duplicate
_md5 and _sha, via a roundabout route through hash.py.
The python wrappers seem overly complicated, with things like
def new(name, string=None):
being common where the following would suffice;
I think this is because _hash.new() uses an optional string parameter,
but I have a feeling a C update with a zero length string is faster than
this Python if. If it was a concern, the C implementation could check
the value of the string length before calling update.
Given the convenience methods for different hashes in hashes.py (which
incidentally look like they are only available when _hash is not
available... something else that needs fixing), the md5.py module could
be simply coded as;
from hashes import md5
new = md5
Despite all these nit-picks, it looks pretty good. It is orders of
magnitude better than any of the other non-existent solutions, including
the one I didn't code :-)
Donovan Baarda <abo at minkirri.apana.org.au>
More information about the Python-Dev