Python example: possible speedup?

Wed Sep 8 17:00:36 EDT 1999

On Wed, Sep 08, 1999 at 05:37:38PM +0200, Hrvoje Niksic wrote:
> As a Python exercise, I wrote a simple program to "scratch an itch",
> i.e. do something useful.  However, I found that Python's lack of
> speed really bytes me here, so I'd like to hear suggestions for
> speedup.  People who don't like that kind of topic, please skip to the 
> following article.  Others, read on.

... 

> The program was quite easy to write, and easy to read afterwards.  The
> problem is that it is also quite slow.  On my system, it takes about
> 27 CPU seconds (as reported by `time' shell builtin) to do the work,
> which can extend to more than a minute of real time, depending on the
> system load.
> 
> As a comparison, the equivalent Perl program does the same thing in 9
> CPU seconds.  I tried everything I knew to make the Python version
> fast.  I tried to use `re' to avoid returning headers other than the
> ones we're interested in.  I tried changing self.__current to just
> current to avoid a dictionary lookup.  I tried to make self.__current
> a list, to avoid the expensive `current = current + line' operation.
> All of these things made the program measure slower.
> 
> I would really appreciate some suggestions.  The code is not large,
> and is (I hope) rather elegant.  I am a Python beginner, so I'd also
> appreciate tips on Python style and OO technique.  I'll post/mail the
> Perl equivalent on demand.

Hi,

the following code is about five times faster here. That means it's
faster than perl, I suppose :-). It is of course not quite as general
as your's, but it seems to fit the job nicely. For this kind of
problem, I usually don't bother with OO.

The main speed advantage seems to be that the files aren't processed
line by line, which is of course very memory consuming.

HTH,
Robert

#! /usr/bin/env python

import string

def get_installed():
    fstatus = open('/var/lib/dpkg/status', 'r')
    status = fstatus.read()
    fstatus.close()
    status = string.split(status, '\n\n')
    installed = []
    for package in status:
        fields = string.split(package, '\n')
        name = fields[0][9:]
        for line in fields:
            if line[:7] == 'Status:':
                if string.split(line[8:])[-1] == 'installed':
                    installed.append(name)
                break
    return installed

def get_sizes(packages):
    favailable = open('/var/lib/dpkg/available', 'r')
    available = favailable.read()
    favailable.close()
    available = string.split(available, '\n\n')
    results = []
    for package in available:
        fields = string.split(package, '\n')
        name = fields[0][9:]
        if name in packages:
            for line in fields:
                if line[:15] == 'Installed-Size:':
                    results.append(name, int(line[16:]))
                    break
    return results

def main():
    results = get_sizes(get_installed())
    results.sort(lambda a, b: cmp(b[1], a[1]))
    for r in results:
        print '%s: %d' % r

if __name__ == '__main__':
    main()

-- 
Robert Vollmert                                      rvollmert at gmx.net