Python example: possible speedup?

John Mitchell johnm at magnet.com
Wed Sep 8 13:57:14 EDT 1999


On 8 Sep 1999, Hrvoje Niksic wrote:

> As a Python exercise, I wrote a simple program to "scratch an itch",
> i.e. do something useful.  However, I found that Python's lack of
> speed really bytes me here, so I'd like to hear suggestions for
> speedup.  People who don't like that kind of topic, please skip to the 
> following article.  Others, read on.
> 
> When browsing my Debian packages, I found I often wanted to know the
> size of the installed packages.


A few suggestions:

1) almost never use a catch-all-exceptions block.  For example, in your
next_header method.

2) avoid doing semi-slow operations in the middle of a loop:

- string addition (self.__current = self.__current + line)

- reading a file one line at a time (self.__fp.readline())

3) lambdas are fun, but very slow.

4) objects are your friend.  Dont know about speed, but they simplify code
*so much* that I always use them, even for dump dictionary-like and
list-like things.


That is:

1) make low-level object classes, usually subclassing from UserDict or
UserList.

2) do many, bulk, bulk operations, instead of doing a lot of stuff in the
middle of a loop.  Repeated short loops are better.


Work is slow, so I've coded up an "example".  It doesnt really work, since
I dont have a 'status' or 'available' files to use -- please send me yours
(privately via email), and I'll get this code to work on it.


- j


import string

from UserList import UserList
from UserDict import UserDict

# a single package entry, which is like a dictionary.
# IE: ent['Source'] => 'netkit-telnet'
#
class Entry(UserDict):
    def isInstalled(self):
	# "install ok installed" => 1, else 0
	return string.split(self['Status'])[-1] == 'installed'
    def installedSize(self):
	if self.isInstalled():
	    return string.atoi( self['Installed-Size'] )
	return 0

class DpkgReaderJM(UserDict):
    _primaryKey	= 'Package'
    
    def __init__(self, path=None):
	UserDict.__init__(self)
	if path:
	    self.feed(path=path)
	    
    def parseLine(self, line):
	if not line:
	    e, self._entry	= self._entry, None
	    return e
	
	if line[0] == ' ':
	    # XX: handle multiline values here (ie: Description)
	    pass
	else:
	    key, value	= string.split(line, ':', 1)
	    self._entry[key] = string.strip(value)
#     def flush(self):
# 	return self._entry
	    
    def feed(self, path=None, data=None):
	if not data:
	    data = open(path).read()
	self._entry	= Entry()
	entryList	= map(self.parseLine, string.split(data, '\n'))
	# skip blank entries:
	entryList	= filter(None, entryList)
	primaryKey	= self._primaryKey
	for ent in entryList:
	    self.data[ent[primaryKey]] = ent
	
class StatusReader(DpkgReaderJM):
    pass
class AvailableReader(DpkgReaderJM):
    pass


def main_orig():
    installed = StatusReader('/var/lib/dpkg/status')
    avail = AvailableReader('/var/lib/dpkg/available', installed)
    lst = sizes.keys()
    lst.sort(lambda a, b, sizes=sizes: cmp(sizes[b], sizes[a]))
    for pack in lst:
        print "%s: %d" % (pack, sizes[pack])

def main():
    global status
    status = StatusReader('status')
    print 'applications:\n\t', status.keys()
    print 'telnet installed size:\n\t', status['telnet'].installedSize()
    
if __name__=="__main__":
    main()
    







More information about the Python-list mailing list