Need "sum" compatible checksum16 in Python

Alex Martelli aleaxit at yahoo.com
Sat May 12 07:30:33 EDT 2001


"B. Douglas Hilton" <doug.hilton at engineer.com> wrote in message
news:3AFCF5EA.E181DCA1 at engineer.com...
> Here is the relevent excerpt from sum.c from GNU textutils:
>
>   while ((ch = getc (fp)) != EOF)
>     {
>       total_bytes++;
>       ROTATE_RIGHT (checksum);
>       checksum += ch;
>       checksum &= 0xffff;       /* Keep it within bounds. */
>     }
>
> This seems pretty trivial. In C there is no problem with
> adding characters as integers, but I have found that it is
> a big problem in Python. Maybe I need to make a c-lib
> function from this and link with Python?

For speed, you may choose to do that, but there's really
no need in terms of pure functionality, even considering
the definition of rotate_right you posted later as:

#define ROTATE_RIGHT(c) if ((c) & 01) (c) = ((c) >>1) + 0x8000; else (c) >>=
1;

The builtin array module helps -- something like the
following untested code should work fine:

import array

def checksum(fileobj):
    filedata = array.array('B', fileobj.read())
    totbytes = len(filedata)
    result = 0
    def rotate_right(c):
        if c&1: return (c>>1)|0x8000
        else: return c>>1
    for ch in filedata:
        result = (rotate_right(result)+ch) & 0xFFFF
    return result

but you don't HAVE to use array -- it should also
work with something like:

def checksum(fileobj):
    totbytes = 0
    result = 0
    def rotate_right(c):
        if c&1: return (c>>1)|0x8000
        else: return c>>1
    for ch in fileobj.read():
        ch = ord(ch) & 0xFF
        result = (rotate_right(result)+ch) & 0xFFFF
    return result

I'm not sure if ord() returns -128 to 127, or 0 to 255,
whence the &0xFF.  If the file is huge you may also
read it in slices, of course, with either array or the
direct loop on filedata approach.


I believe your worry that Python "may be too high level
for this computation" is misplaced.  It's easy in Python
to get down & dirty with the bits, if and when you want
to -- array and struct help, but you may do without them
if you want.  It's all an issue of speed... low-level stuff
often needs to be very speedy, so recoding it as a C coded
extension may be worth the bother.  But you can at least
prototype it in Python without breaking a sweat:-).


Alex






More information about the Python-list mailing list