Speed ain't bad

Bulba! bulba at bulba.com
Thu Dec 30 19:41:13 EST 2004


One of the posters inspired me to do profiling on my newbie script
(pasted below). After measurements I have found that the speed
of Python, at least in the area where my script works, is surprisingly
high.

This is the experiment: a script recreates the folder hierarchy
somewhere else and stores there the compressed versions of 
files from source hierarchy (the script is doing additional backups
of the disk of file server at the company where I work onto other
disks, with compression for sake of saving space). The data was: 

468 MB, 15057 files, 1568 folders
(machine: win2k, python v2.3.3)

The time that WinRAR v3.20 (with ZIP format and normal compression
set) needed to compress all that was 119 seconds.

The Python script time (running under profiler) was, drumroll... 

198 seconds.

Note that the Python script had to laboriously recreate the tree of
1568 folders and create over 15 thousand compressed files, so 
it had more work to do actually than WinRAR did. The size of
compressed data was basically the same, about 207 MB.

I find it very encouraging that in the real world area of application
a newbie script written in the very high-level  language can have the
performance that is not that far from the performance of "shrinkwrap"
pro archiver (WinRAR is excellent archiver, both when it comes to
compression as well as speed). I do realize that this is mainly
the result of all the "underlying infrastructure" of Python. Great
work, guys. Congrats.

The only thing I'm missing in this picture is knowledge if my script
could be further optimised (not that I actually need better
performance, I'm just curious what possible solutions could be). 

Any takers among the experienced guys?



Profiling results:

>>> p3.sort_stats('cumulative').print_stats(40)
Fri Dec 31 01:04:14 2004    p3.tmp

         580543 function calls (568607 primitive calls) in 198.124 CPU
seconds

   Ordered by: cumulative time
   List reduced from 69 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall
filename:lineno(function)
        1    0.013    0.013  198.124  198.124 profile:0(z3())
        1    0.000    0.000  198.110  198.110 <string>:1(?)
        1    0.000    0.000  198.110  198.110 <interactive
input>:1(z3)
        1    1.513    1.513  198.110  198.110 zmtree3.py:26(zmtree)
    15057   14.504    0.001  186.961    0.012 zmtree3.py:7(zf)
    15057  147.582    0.010  148.778    0.010
C:\Python23\lib\zipfile.py:388(write)
    15057   12.156    0.001   12.156    0.001
C:\Python23\lib\zipfile.py:182(__init__)
    32002    7.957    0.000    8.542    0.000
C:\PYTHON23\Lib\ntpath.py:266(isdir)
13826/1890    2.550    0.000    8.143    0.004
C:\Python23\lib\os.py:206(walk)
    30114    3.164    0.000    3.164    0.000
C:\Python23\lib\zipfile.py:483(close)
    60228    1.753    0.000    2.149    0.000
C:\PYTHON23\Lib\ntpath.py:157(split)
    45171    0.538    0.000    2.116    0.000
C:\PYTHON23\Lib\ntpath.py:197(basename)
    15057    1.285    0.000    1.917    0.000
C:\PYTHON23\Lib\ntpath.py:467(abspath)
    33890    0.688    0.000    1.419    0.000
C:\PYTHON23\Lib\ntpath.py:58(join)
   109175    0.783    0.000    0.783    0.000
C:\PYTHON23\Lib\ntpath.py:115(splitdrive)
    15057    0.196    0.000    0.768    0.000
C:\PYTHON23\Lib\ntpath.py:204(dirname)
    33890    0.433    0.000    0.731    0.000
C:\PYTHON23\Lib\ntpath.py:50(isabs)
    15057    0.544    0.000    0.632    0.000
C:\PYTHON23\Lib\ntpath.py:438(normpath)
    32002    0.431    0.000    0.585    0.000
C:\PYTHON23\Lib\stat.py:45(S_ISDIR)
    15057    0.555    0.000    0.555    0.000
C:\Python23\lib\zipfile.py:149(FileHeader)
    15057    0.483    0.000    0.483    0.000
C:\Python23\lib\zipfile.py:116(__init__)
      151    0.002    0.000    0.435    0.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:171(write)
      151    0.002    0.000    0.432    0.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:489(write)
      151    0.013    0.000    0.430    0.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:461(HandleOutput)
       76    0.087    0.001    0.405    0.005
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:430(QueueFlush)
    15057    0.239    0.000    0.340    0.000
C:\Python23\lib\zipfile.py:479(__del__)
    15057    0.157    0.000    0.157    0.000
C:\Python23\lib\zipfile.py:371(_writecheck)
    32002    0.154    0.000    0.154    0.000
C:\PYTHON23\Lib\stat.py:29(S_IFMT)
       76    0.007    0.000    0.146    0.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:262(dowrite)
       76    0.007    0.000    0.137    0.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\formatter.py:221(OnStyleNeeded)
       76    0.011    0.000    0.118    0.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\interact.py:197(Colorize)
       76    0.110    0.001    0.112    0.001
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:69(SCIInsertText)
       76    0.079    0.001    0.081    0.001
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:333(GetTextRange)
       76    0.018    0.000    0.020    0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:296(SetSel)
       76    0.006    0.000    0.018    0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\document.py:149(__call__)
      227    0.003    0.000    0.012    0.000
C:\Python23\lib\Queue.py:172(get_nowait)
       76    0.007    0.000    0.011    0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\interact.py:114(ColorizeInteractiveCode)
      532    0.011    0.000    0.011    0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:330(GetTextLength)
       76    0.001    0.000    0.010    0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\view.py:256(OnBraceMatch)
     1888    0.009    0.000    0.009    0.000
C:\PYTHON23\Lib\ntpath.py:245(islink)


---
Script:

#!/usr/bin/python

import os
import sys
from zipfile import ZipFile, ZIP_DEFLATED

def zf(sfpath, targetdir):
    if (sys.platform[:3] == 'win'):
        tgfpath=sfpath[2:]
    else:
        tgfpath=sfpath
    zfdir=os.path.dirname(os.path.abspath(targetdir) + tgfpath)
    zfpath=zfdir + os.path.sep + os.path.basename(tgfpath) + '.zip'
    if(not os.path.isdir(zfdir)):
        os.makedirs(zfdir)
    archive=ZipFile(zfpath, 'w', ZIP_DEFLATED)
    sfile=open(sfpath,'rb')
    zfname=os.path.basename(tgfpath)
    archive.write(sfpath, os.path.basename(zfpath), ZIP_DEFLATED)
    archive.close()
    ssize=os.stat(sfpath).st_size
    zsize=os.stat(zfpath).st_size
    return (ssize,zsize)


def zmtree(sdir,tdir):
    n=0
    ssize=0
    zsize=0
    sys.stdout.write('\n ')
    for root, dirs, files in os.walk(sdir):
        for file in files:
            res=zf(os.path.join(root,file),tdir)
            ssize+=res[0]
            zsize+=res[1]
            n=n+1
            #sys.stdout.write('.')
            if (n % 200 == 0):
                print "  %.2fM (%.2fM)" % (ssize/1048576.0,
zsize/1048576.0)
                #sys.stdout.write(' ')
    return (n, ssize, zsize)
                

if __name__=="__main__":
    if len(sys.argv) == 3:
        if(os.path.isdir(sys.argv[1]) and os.path.isdir(sys.argv[2])):

(n,ssize,zsize)=zmtree(os.path.abspath(sys.argv[1]),os.path.abspath(sys.argv[2]))
            print "\n\n Summary:\n  Number of files compressed: %d\n
Total size of original files: %.2fM\n  \
Total size of compressed files: %.2fM" % (n, ssize/1048576.0,
zsize/1048576.0)
            sys.exit(0)
        else:
            print "Incorrect arguments."
            if (not os.path.isdir(sys.argv[1])): print sys.argv[1] + "
is not directory."
            if (not os.path.isdir(sys.argv[2])): print sys.argv[2] + "
is not directory."

print "\n Usage:\n " + sys.argv[0] + " source-directory
target-directory"





--
It's a man's life in a Python Programming Association.



More information about the Python-list mailing list