[Python-3000] binascii.crc32 vs zlib.crc32

Gregory P. Smith greg at krypto.org
Mon Mar 24 01:17:23 CET 2008


trunk r61823 now compiles binascii to use zlib's crc32 rather than its own
if the zlib library is available at compile time (that should shave a
kilobyte off binascii.so on most systems).

{binascii,zlib}.crc32, zlib.adler32 and binascii.crc_hqx could all
potentially live in the hashlib module but their APIs are a notably
different as they return a number rather than a string (leaving
serialization byte order up to the user and are functions that allow passing
in the start value rather than objects that maintain an internal state to be
retrieved when done adding data.  It'd be easy enough to wrap them to have
the same API in python but that'd destroy any speed benefit.

The hashlib documentation and docstring mention that people should look at
zlib if they want crc32 or adler32.  IMHO thats enough even if zlib (or
binascii) is a "weird" place in an overly logical sense.  They're not really
the same class of hash function.

-g

On Thu, Mar 20, 2008 at 12:02 AM, Guido van Rossum <guido at python.org> wrote:

> Hm.  zlib is an odd place to find this API (unless you know way more
> about gzip than healthy :-).  Though binascii isn't much better.  I'd
> rather expect this in the vicinity of md5 and sha... Is it possible to
> tweak that C code to use the zlib version if present and the old C
> code otherwise?
>
> On Tue, Mar 18, 2008 at 3:21 PM, Gregory P. Smith <greg at krypto.org> wrote:
> > Both modules have a crc32 function.  The zlib version is faster when
> zlib
> > has been compiled optimally or about the same when zlib is old or uses
> its C
> > code.
> >
> > Should we ditch the binascii.crc32 version in py3k?
> >
> >  64bit Linux (CentOS 5.1):
> >
> > $ python2.4 -m timeit 'foo="abcdefghijklmnop"*10' 'import binascii as
> mod'
> > 'f = mod.crc32' 'for x in xrange(100000): f(foo)'
> >  10 loops, best of 3: 108 msec per loop
> > $ python2.4 -m timeit 'foo="abcdefghijklmnop"*10' 'import zlib as mod'
> 'f =
> > mod.crc32' 'for x in xrange(100000): f(foo)'
> >  10 loops, best of 3: 40.5 msec per loop
> >
> > 32bit MacOS X 10.4:
> >
> > % python2.3 /usr/lib/python2.3/timeit.py 'foo="abcdefghijklmnop"*10'
> 'import
> > binascii as mod' 'f = mod.crc32' 'for x in xrange(100000): f(foo)'
> >  10 loops, best of 3: 7.37e+04 usec per loop
> > % python2.3 /usr/lib/python2.3/timeit.py 'foo="abcdefghijklmnop"*10'
> 'import
> > zlib as mod' 'f = mod.crc32' 'for x in xrange(100000): f(foo)'
> >  10 loops, best of 3: 4.62e+04 usec per loop
> >
> > Removal from binascii would break things for platforms or embedded
> systems
> > wanting crc32 that don't want to include zlib.  Anyone care?
> >
> > What about 2.x?  if we remove the redundancy in py3k i guess we
> deprecate
> > binascii.crc32 and remove in 2.7?
> >
> > -gps
> >
> > _______________________________________________
> >  Python-3000 mailing list
> >  Python-3000 at python.org
> >  http://mail.python.org/mailman/listinfo/python-3000
> >  Unsubscribe:
> > http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
> >
>
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/<http://www.python.org/%7Eguido/>
> )
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20080323/95e38479/attachment-0001.htm 


More information about the Python-3000 mailing list