looking for speed-up ideas

Tue Feb 4 14:36:16 EST 2003

Ram Bhamidipaty wrote:
> Ok. Here are two scripts:

Indeed.  I'm still surprised my code is that much slower.

Here are my thoughts:
   - you use xreadlines.xreadlines(f) where I use "for line in f"
       I thought they are the same speed, but perhaps I'm wrong.
       Could you change that and find out?

   - You do
       try / int() / except Value Error / long()
      when I always do long().  It could be that longs are much more
      expensive than ints.  Also, in newer pythons the int converts
      to a long as needed so this is no longer needed.

   - you do an explicit compare against the max size before insertion
      into the list while I always append.  That can be a big slowdown,
      so change

      for line in infile:
          if line[:1] == "F":
              ignore, size, name = line.split("/")
              # negate size so 'largest' is sorted first
              fileinfo.append( (-long(size), dirid, name) )
              if len(fileinfo) > 10000:
                  # Could use a heapq....
                  fileinfo.sort()
                  fileinfo = fileinfo[:200]
      into

      min_allowed = 0

      for line in infile:
          if line[:1] == "F":
              ignore, size, name = line.split("/")
              # negate size so 'largest' is sorted first
              size = long(size)
              if size > min_allowed:
                  fileinfo.append( (-long(size), dirid, name) )
              if len(fileinfo) > 10000:
                  # Could use a heapq....
                  fileinfo.sort()
                  del fileinfo[200:]
                  min_allowed = -fileinfo[-1][0]

(WARNING: untested!)

Notice that I also switched it to delete everything after
position 200 instead of creating a new list from the first 200.

When lists grow, they allocate more memory than they need,
to prevent a realloc for every append.  I'm guessing that the
delete will save more memory -- at the least, it keeps
memory in the same position than continuously swapping between
two memory segments.

Or you can replace it with your heap code (if the C code
doesn't work, try the one from the Python 2.3 standard lib.)

  - I or someone else really needs to work on a better profiler.
After all, the new hotspot code is now > 1 year old!  :)

					Andrew
					dalke at dalkescientific.com