looking for speed-up ideas
Andrew Dalke
adalke at mindspring.com
Tue Feb 4 14:36:16 EST 2003
Ram Bhamidipaty wrote:
> Ok. Here are two scripts:
Indeed. I'm still surprised my code is that much slower.
Here are my thoughts:
- you use xreadlines.xreadlines(f) where I use "for line in f"
I thought they are the same speed, but perhaps I'm wrong.
Could you change that and find out?
- You do
try / int() / except Value Error / long()
when I always do long(). It could be that longs are much more
expensive than ints. Also, in newer pythons the int converts
to a long as needed so this is no longer needed.
- you do an explicit compare against the max size before insertion
into the list while I always append. That can be a big slowdown,
so change
for line in infile:
if line[:1] == "F":
ignore, size, name = line.split("/")
# negate size so 'largest' is sorted first
fileinfo.append( (-long(size), dirid, name) )
if len(fileinfo) > 10000:
# Could use a heapq....
fileinfo.sort()
fileinfo = fileinfo[:200]
into
min_allowed = 0
for line in infile:
if line[:1] == "F":
ignore, size, name = line.split("/")
# negate size so 'largest' is sorted first
size = long(size)
if size > min_allowed:
fileinfo.append( (-long(size), dirid, name) )
if len(fileinfo) > 10000:
# Could use a heapq....
fileinfo.sort()
del fileinfo[200:]
min_allowed = -fileinfo[-1][0]
(WARNING: untested!)
Notice that I also switched it to delete everything after
position 200 instead of creating a new list from the first 200.
When lists grow, they allocate more memory than they need,
to prevent a realloc for every append. I'm guessing that the
delete will save more memory -- at the least, it keeps
memory in the same position than continuously swapping between
two memory segments.
Or you can replace it with your heap code (if the C code
doesn't work, try the one from the Python 2.3 standard lib.)
- I or someone else really needs to work on a better profiler.
After all, the new hotspot code is now > 1 year old! :)
Andrew
dalke at dalkescientific.com
More information about the Python-list
mailing list