Getting the directory size

Ken Seehof 12klat at sightreader.com
Thu Jun 1 20:12:12 EDT 2000


It seems unlikely that python is a bottleneck here, since the algorithm is I/O
bound (i.e. most of the time is spent accessing your hard drive).  Steve's
suggestion will help since you don't want to run getDirSize() twice, but I
don't think you'll be able to do any better than that (even if you wrote an
extension module).

Note that if you right-click on a large directory (such as C:\Windows) from
Explorer, and click "properties", NT will take a little while to get the Size
value (which is the same as your algorithm will produce).  You're algorithm
(with Steve's enhancement) should have the same performance as the "Windows
Properties" dialog (but be careful not to be fooled by caching effects - the
second time you get a directory size will automatically be much faster than the
first).

- Ken Seehof
kens at sightreader.com
www.sightreader.com/kens

Pieter Claerhout wrote:

> Hi all,
>
> I want to be able to find the total size of a directory on my NT machine,
> and I started with writing it using the os.path.walk function, and it looks
> something like this:
>
> <code>
> import os
> import sys
>
> def calcDirSize(arg, dir, files):
>         for file in files:
>                 stats = os.stat(os.path.join(dir, file))
>                 size = stats[6]
>                 arg.append(size)
>
> def getDirSize(dir):
>
>         sizes = []
>         os.path.walk(dir, calcDirSize, sizes)
>         total = 0
>         for size in sizes:
>                 total = total + size
>         if total > 1073741824:
>                 return (round(total/1073741824.0, 2), 'GB')
>         if total > 1048576:
>                 return (round(total/1048576.0, 2), 'MB')
>         if total > 1024:
>                 return (round(total/1024.0, 2), 'KB')
>         return (total, 'bytes')
>
> def main():
>
>         dir = sys.argv[1]
>
>         print "Testing directorySize..."
>         print "Directory: %s" %(dir)
>         print "Size: %s %s" %(getDirSize(dir)[0], getDirSize(dir)[1])
>
> if __name__ == '__main__':
>         main()
> </code>
>
> Is there any way to fasten up this beauty, because depending on the
> number of directories it has to walk through, it takes a long time. Is
> there maybe a function in the win32 modules who does this for me,
> but lots faster?
>
> Kind regards,
>
> Pieter Claerhout




More information about the Python-list mailing list