Python example: possible speedup?
John Mitchell
johnm at magnet.com
Wed Sep 8 13:57:14 EDT 1999
On 8 Sep 1999, Hrvoje Niksic wrote:
> As a Python exercise, I wrote a simple program to "scratch an itch",
> i.e. do something useful. However, I found that Python's lack of
> speed really bytes me here, so I'd like to hear suggestions for
> speedup. People who don't like that kind of topic, please skip to the
> following article. Others, read on.
>
> When browsing my Debian packages, I found I often wanted to know the
> size of the installed packages.
A few suggestions:
1) almost never use a catch-all-exceptions block. For example, in your
next_header method.
2) avoid doing semi-slow operations in the middle of a loop:
- string addition (self.__current = self.__current + line)
- reading a file one line at a time (self.__fp.readline())
3) lambdas are fun, but very slow.
4) objects are your friend. Dont know about speed, but they simplify code
*so much* that I always use them, even for dump dictionary-like and
list-like things.
That is:
1) make low-level object classes, usually subclassing from UserDict or
UserList.
2) do many, bulk, bulk operations, instead of doing a lot of stuff in the
middle of a loop. Repeated short loops are better.
Work is slow, so I've coded up an "example". It doesnt really work, since
I dont have a 'status' or 'available' files to use -- please send me yours
(privately via email), and I'll get this code to work on it.
- j
import string
from UserList import UserList
from UserDict import UserDict
# a single package entry, which is like a dictionary.
# IE: ent['Source'] => 'netkit-telnet'
#
class Entry(UserDict):
def isInstalled(self):
# "install ok installed" => 1, else 0
return string.split(self['Status'])[-1] == 'installed'
def installedSize(self):
if self.isInstalled():
return string.atoi( self['Installed-Size'] )
return 0
class DpkgReaderJM(UserDict):
_primaryKey = 'Package'
def __init__(self, path=None):
UserDict.__init__(self)
if path:
self.feed(path=path)
def parseLine(self, line):
if not line:
e, self._entry = self._entry, None
return e
if line[0] == ' ':
# XX: handle multiline values here (ie: Description)
pass
else:
key, value = string.split(line, ':', 1)
self._entry[key] = string.strip(value)
# def flush(self):
# return self._entry
def feed(self, path=None, data=None):
if not data:
data = open(path).read()
self._entry = Entry()
entryList = map(self.parseLine, string.split(data, '\n'))
# skip blank entries:
entryList = filter(None, entryList)
primaryKey = self._primaryKey
for ent in entryList:
self.data[ent[primaryKey]] = ent
class StatusReader(DpkgReaderJM):
pass
class AvailableReader(DpkgReaderJM):
pass
def main_orig():
installed = StatusReader('/var/lib/dpkg/status')
avail = AvailableReader('/var/lib/dpkg/available', installed)
lst = sizes.keys()
lst.sort(lambda a, b, sizes=sizes: cmp(sizes[b], sizes[a]))
for pack in lst:
print "%s: %d" % (pack, sizes[pack])
def main():
global status
status = StatusReader('status')
print 'applications:\n\t', status.keys()
print 'telnet installed size:\n\t', status['telnet'].installedSize()
if __name__=="__main__":
main()
More information about the Python-list
mailing list