Python example: possible speedup?
Robert Vollmert
rvollmert at gmx.net
Wed Sep 8 17:00:36 EDT 1999
On Wed, Sep 08, 1999 at 05:37:38PM +0200, Hrvoje Niksic wrote:
> As a Python exercise, I wrote a simple program to "scratch an itch",
> i.e. do something useful. However, I found that Python's lack of
> speed really bytes me here, so I'd like to hear suggestions for
> speedup. People who don't like that kind of topic, please skip to the
> following article. Others, read on.
...
> The program was quite easy to write, and easy to read afterwards. The
> problem is that it is also quite slow. On my system, it takes about
> 27 CPU seconds (as reported by `time' shell builtin) to do the work,
> which can extend to more than a minute of real time, depending on the
> system load.
>
> As a comparison, the equivalent Perl program does the same thing in 9
> CPU seconds. I tried everything I knew to make the Python version
> fast. I tried to use `re' to avoid returning headers other than the
> ones we're interested in. I tried changing self.__current to just
> current to avoid a dictionary lookup. I tried to make self.__current
> a list, to avoid the expensive `current = current + line' operation.
> All of these things made the program measure slower.
>
> I would really appreciate some suggestions. The code is not large,
> and is (I hope) rather elegant. I am a Python beginner, so I'd also
> appreciate tips on Python style and OO technique. I'll post/mail the
> Perl equivalent on demand.
Hi,
the following code is about five times faster here. That means it's
faster than perl, I suppose :-). It is of course not quite as general
as your's, but it seems to fit the job nicely. For this kind of
problem, I usually don't bother with OO.
The main speed advantage seems to be that the files aren't processed
line by line, which is of course very memory consuming.
HTH,
Robert
#! /usr/bin/env python
import string
def get_installed():
fstatus = open('/var/lib/dpkg/status', 'r')
status = fstatus.read()
fstatus.close()
status = string.split(status, '\n\n')
installed = []
for package in status:
fields = string.split(package, '\n')
name = fields[0][9:]
for line in fields:
if line[:7] == 'Status:':
if string.split(line[8:])[-1] == 'installed':
installed.append(name)
break
return installed
def get_sizes(packages):
favailable = open('/var/lib/dpkg/available', 'r')
available = favailable.read()
favailable.close()
available = string.split(available, '\n\n')
results = []
for package in available:
fields = string.split(package, '\n')
name = fields[0][9:]
if name in packages:
for line in fields:
if line[:15] == 'Installed-Size:':
results.append(name, int(line[16:]))
break
return results
def main():
results = get_sizes(get_installed())
results.sort(lambda a, b: cmp(b[1], a[1]))
for r in results:
print '%s: %d' % r
if __name__ == '__main__':
main()
--
Robert Vollmert rvollmert at gmx.net
More information about the Python-list
mailing list