Optimizing code
Gerrit Holl
gerrit.holl at pobox.com
Fri Feb 25 01:48:54 EST 2000
<quote name="Harald Hanche-Olsen" date="951408593">
> + Gerrit Holl <gerrit.holl at pobox.com>:
>
> | class DiskUsage:
> | __size = 0
> | def add(self, filename):
> | self.__size = self.__size + os.path.getsize(filename)
...
> | def __len__(self):
> | return self.__size
> | Timing turns out that the 'os.path.walk' part takes about 2.7
> | seconds, for a 400 MB dir with 1096 dirs and 9082 files. 'du -s ~'
> | takes 0.2 seconds. What makes this slow? The special methods? The
> | redefinition of an integer? os.path.walk? With longs, it even takes
> | 12 seconds...
>
> One thing that slows your code down, is that it calls stat() three
> times on every regular file in the tree: First, in os.path.isfile,
> second, in os.path.getsize, and third, in os.path.walk, which needs to
> find out if a filename corresponds to a directory or not.
I see.
> | Can I optimize it? If so how?
>
> Here is my best effort so far. It is nearly three times as fast as
> yours (but less portable perhaps). Well, actually yours didn't work
> at all on my system, because the length of a file is a long integer:
>
> File "du.py", line 21, in du
> return len(disk)
> TypeError: __len__() should return an int
A long? I don't see any long? Perhaps you are running a "future"
version of Python silently converting ints to longs?
And, by the way, why can't len() return a long integer?
> class DiskUsage:
...
> def __call__(self, dir):
> # Importing these names is possibly a useless optimization:
> from stat import S_ISDIR, S_ISREG, ST_MODE, ST_SIZE
> files = os.listdir(dir)
> dirs = []
> for file in files:
> filename = os.path.join(dir, file)
> s = os.lstat(filename)
> mode = s[ST_MODE]
> if S_ISDIR(mode):
> dirs.append(filename)
> elif S_ISREG(mode):
> self.__size = self.__size + s[ST_SIZE]
> for dir in dirs:
> self(dir)
> def len(self):
> return self.__size
...
Interesting, thanks!
regards,
Gerrit.
--
Comparison Python GUI's: http://www.nl.linux.org/~gerrit/gui.html
Please comment!
More information about the Python-list
mailing list