Optimizing code

Gerrit Holl gerrit.holl at pobox.com
Fri Feb 25 01:48:54 EST 2000


<quote name="Harald Hanche-Olsen" date="951408593">
> + Gerrit Holl <gerrit.holl at pobox.com>:
> 
> | class DiskUsage:
> |     __size = 0
> |     def add(self, filename):
> |         self.__size = self.__size + os.path.getsize(filename)
...
> |     def __len__(self):
> |         return self.__size

> | Timing turns out that the 'os.path.walk' part takes about 2.7
> | seconds, for a 400 MB dir with 1096 dirs and 9082 files. 'du -s ~'
> | takes 0.2 seconds.  What makes this slow? The special methods? The
> | redefinition of an integer?  os.path.walk? With longs, it even takes
> | 12 seconds...
> 
> One thing that slows your code down, is that it calls stat() three
> times on every regular file in the tree:  First, in os.path.isfile,
> second, in os.path.getsize, and third, in os.path.walk, which needs to
> find out if a filename corresponds to a directory or not.

I see.

> | Can I optimize it? If so how?
> 
> Here is my best effort so far.  It is nearly three times as fast as
> yours (but less portable perhaps).  Well, actually yours didn't work
> at all on my system, because the length of a file is a long integer:
> 
>   File "du.py", line 21, in du
>     return len(disk)
> TypeError: __len__() should return an int

A long? I don't see any long? Perhaps you are running a "future"
version of Python silently converting ints to longs?

And, by the way, why can't len() return a long integer?

> class DiskUsage:
...
>     def __call__(self, dir):
> 	# Importing these names is possibly a useless optimization:
>         from stat import S_ISDIR, S_ISREG, ST_MODE, ST_SIZE
>         files = os.listdir(dir)
>         dirs = []
>         for file in files:
>             filename = os.path.join(dir, file)
>             s = os.lstat(filename)
>             mode = s[ST_MODE]
>             if S_ISDIR(mode):
>                 dirs.append(filename)
>             elif S_ISREG(mode):
>                 self.__size = self.__size + s[ST_SIZE]
>         for dir in dirs:
>             self(dir)
>     def len(self):
>         return self.__size
...

Interesting, thanks!

regards,
Gerrit.

-- 
Comparison Python GUI's: http://www.nl.linux.org/~gerrit/gui.html
Please comment!




More information about the Python-list mailing list