python snippet request: calculate MD5 checksum on 650 MB ISO cdrom image quickly

Alexander Gavrilov gavrilov at iname.com
Tue Oct 24 21:18:56 EDT 2000


    Ok. I played with your script a little. I discovered some drawbacks of
it. First, the whole read file became garbage when you tried to warm up the
cache. Second (I think) the system frees the cache when the file is closed.
    Therefore, I corrected the script (the new script in attachment):
       1. added a new function just to read the file without calculation
       2. call this function to warm up the file cache
       3. play with various sizes of chunk

    Results are followed. The file size is 95823736 bytes. My system has 256
MB of RAM so the whole file is in the cache I presume. Indeed, CRC is a
little faster then MD5, but not radically. Note also, whereas the system
penalize you for a frequent reading with a small chunks, the speed of memory
allocation begins to play a significant role in big chunks.

Chunk size = 2**8
Time for simpleread is 6.2
Time for getcrc32 is 13.31
Time for getmd5 is 14.105
Time for getsha is 17.765

Chunk size = 2**10
Time for simpleread is 4.432
Time for getcrc32 is 9.013
Time for getmd5 is 10.809
Time for getsha is 14.124

Chunk size = 2**12
Time for simpleread is 2.369
Time for getcrc32 is 5.479
Time for getmd5 is 8.013
Time for getsha is 10.831

Chunk size = 2**16
Time for simpleread is 2.7
Time for getcrc32 is 5.487
Time for getmd5 is 8.258
Time for getsha is 10.835

Chunk size = 2**18
Time for simpleread is 2.898
Time for getcrc32 is 5.707
Time for getmd5 is 8.443
Time for getsha is 10.972

Chunk size = 2**20
Time for simpleread is 4.2
Time for getcrc32 is 7.271
Time for getmd5 is 10.049
Time for getsha is 12.678

Chunk size = 2**25
Time for simpleread is 4.7
Time for getcrc32 is 8.038
Time for getmd5 is 10.793
Time for getsha is 13.466

Chunk size = 95823737 (file size+1)
Time for simpleread is 5.6
Time for getcrc32 is 8.928
Time for getmd5 is 11.694
Time for getsha is 14.371

"Tim Peters" <tim_one at email.msn.com> wrote in message
news:mailman.972427267.4483.python-list at python.org...
> [Alexander Gavrilov]
> > I run test script Tim posted and got the following results on my system:
> >
> > Timing C:\Program Files\Microsoft Visual
> > Studio\MSDN98\98VS\1033\MSDNVS98.CHQ w/ size 95823736
> > Time for getcrc32 is 29.852
> > Time for getmd5 is 29.444
> > Time for getsha is 31.513
> >
> > The system is Windows NT4 SP6, Dual Pentium II 300 Mhz
>
> The program as posted used a 32Mb chunk size, so your memory configuration
> is more interesting than anything else.  Play with this line:
>
> > >     func(f, 2**25)
>
> and see how it changes.  The sweet spot for you is probably substantially
> lower.
>
>
>


begin 666 test.py
M:6UP;W)T(&UD-2P at 8FEN87-C:6DL('-H82P@=&EM92P@;W,*"F1E9B!S:6UP
M;&5R96%D*&8L($-(54Y+/3(J*C$V*3H*(" @(')E<W5L=" ](# *(" @('=H
M:6QE(#$Z"B @(" @(" @8VAU;FL@/2!F+G)E860H0TA53DLI"B @(" @(" @
M:68@;F]T(&-H=6YK. at H@(" @(" @(" @("!B<F5A:PH@(" @(" @(&1E;"!C
M:'5N:PH@(" @<F5T=7)N(')E<W5L= H*9&5F(&=E=&-R8S,R*&8L($-(54Y+
M/3(J*C$V*3H*(" @(')E<W5L=" ](# *(" @('=H:6QE(#$Z"B @(" @(" @
M8VAU;FL@/2!F+G)E860H0TA53DLI"B @(" @(" @:68@;F]T(&-H=6YK. at H@
M(" @(" @(" @("!B<F5A:PH@(" @(" @(')E<W5L=" ](&)I;F%S8VEI+F-R
M8S,R*&-H=6YK+"!R97-U;'0I"B @(" @(" @9&5L(&-H=6YK"B @("!R971U
M<FX@<F5S=6QT"@ID968 at 9V5T;60U*&8L($-(54Y+/3(J*C$V*3H*(" @(&T@
M/2!M9#4N;F5W*"D*(" @('=H:6QE(#$Z"B @(" @(" @8VAU;FL@/2!F+G)E
M860H0TA53DLI"B @(" @(" @:68@;F]T(&-H=6YK. at H@(" @(" @(" @("!B
M<F5A:PH@(" @(" @(&TN=7!D871E*&-H=6YK*0H@(" @(" @(&1E;"!C:'5N
M:PH@(" @<F5T=7)N(&TN9&EG97-T*"D*"F1E9B!G971S:&$H9BP at 0TA53DL]
M,BHJ,38I. at H@(" @;2 ]('-H82YN97<H*0H@(" @=VAI;&4@,3H*(" @(" @
M("!C:'5N:R ](&8N<F5A9"A#2%5.2RD*(" @(" @("!I9B!N;W0 at 8VAU;FLZ
M"B @(" @(" @(" @(&)R96%K"B @(" @(" @;2YU<&1A=&4H8VAU;FLI"B @
M(" @(" @9&5L(&-H=6YK"B @("!R971U<FX@;2YD:6=E<W0H*0H*5$535" ]
M('(B0SI<4')O9W)A;2!&:6QE<UQ-:6-R;W-O9G0 at 5FES=6%L(%-T=61I;UQ-
M4T1..3A<.3A64UPQ,#,S7$U31$Y64SDX+D-(42(*"G!R:6YT(")4:6UI;F<B
M+"!415-4+" B=R\@<VEZ92(L(&]S+G!A=&@N9V5T<VEZ92A415-4*0H*0TA5
M3DM?4TE:12 ](#(J*C$P"B-#2%5.2U]325I%(#T at .34X,C,W,S8K,0H*(R!7
M87)M('5P('1H92!S>7-T96T at 9FEL92!C86-H92P@=&\@879O:60@<&5N86QI
M>FEN9R!T:&4 at 9FER<W0 at 9G5N8RX*9B ](&]P96XH5$535"P@(G)B(BD*<VEM
M<&QE<F5A9"AF+$-(54Y+7U-)6D4I"@IF;W(@9G5N8R!I;B!S:6UP;&5R96%D
M+"!G971C<F,S,BP at 9V5T;60U+"!G971S:&$Z"B @("!F+G-E96LH,"PP*0H@
M(" @<W1A<G0@/2!T:6UE+F-L;V-K*"D*(" @(&9U;F,H9BP at 0TA53DM?4TE:
M12D*(" @(&9I;FES:" ]('1I;64N8VQO8VLH*0H@(" @<')I;G0@(E1I;64@
M9F]R(BP at 9G5N8RY?7VYA;65?7RP@(FES(BP@<F]U;F0H9FEN:7-H("T@<W1A
*<G0L(#,I"@H*"@``
`
end




More information about the Python-list mailing list