looking for speed-up ideas

Tue Feb 4 23:41:28 EST 2003

> ====< ramb.py >=====================================================
> # ramb.py
> import sys
> lines = file(sys.argv[1]).readlines()
> tups = []; tapp = tups.append; i = -1
> for line in lines:
>     i += 1
>     if line.startswith('F'): tapp((int(line.split('/')[1]), i))
> tups.sort()
> dict200 = dict([(i, size) for size, i in tups[-200:]])
> 
> path = [tuple(lines[0].split()[1:])]
> i = -1
> for line in lines:
>     i += 1
>     if line.startswith('S'):
>         name, parent, thisnum = line[2:].split('/')
>         while path and path[-1][1] != parent: path.pop()
>         path.append((name, thisnum.strip()))
>     if not dict200.has_key(i): continue
>     dict200[i] = (dict200[i], path[:], lines[i]) # size, path, fname
> tups = dict200.values()
> tups.sort()
> fmt = '%12s  %s' 
> print fmt % ('size', 'path')
> print fmt % ('-'*12, '-'*50)
> for size, path, fname in tups:
>     fname = fname.split('/')[-1].strip()
>     path = '/'.join([name for name, num in path]+[fname])
>     print fmt % (size, path)    
> ====================================================================
> running this on your test data

> I'd be curious how long it would run on your machine. I assume your
> memory is large enough to hold the line list.

Thank you for the reply.

Your script ran in:

espring> python /remote/espring/ramb/tools/lib/python2.2/profile.py script4.py /tmp/foo /tmp/foo_4
         3 function calls in 36.720 CPU seconds

You may want to take a look at some of the other postings in this thread. I
suspect that this algo could also be tweaked to get into the same 16 second
range of the other scripts.

It would be impressive if there were a _pure_ python script that could
deliver the performance of the grep + sort + tail command pipe line.

-Ram