looking for speed-up ideas
Ram Bhamidipaty
ramb at sonic.net
Tue Feb 4 23:41:28 EST 2003
> ====< ramb.py >=====================================================
> # ramb.py
> import sys
> lines = file(sys.argv[1]).readlines()
> tups = []; tapp = tups.append; i = -1
> for line in lines:
> i += 1
> if line.startswith('F'): tapp((int(line.split('/')[1]), i))
> tups.sort()
> dict200 = dict([(i, size) for size, i in tups[-200:]])
>
> path = [tuple(lines[0].split()[1:])]
> i = -1
> for line in lines:
> i += 1
> if line.startswith('S'):
> name, parent, thisnum = line[2:].split('/')
> while path and path[-1][1] != parent: path.pop()
> path.append((name, thisnum.strip()))
> if not dict200.has_key(i): continue
> dict200[i] = (dict200[i], path[:], lines[i]) # size, path, fname
> tups = dict200.values()
> tups.sort()
> fmt = '%12s %s'
> print fmt % ('size', 'path')
> print fmt % ('-'*12, '-'*50)
> for size, path, fname in tups:
> fname = fname.split('/')[-1].strip()
> path = '/'.join([name for name, num in path]+[fname])
> print fmt % (size, path)
> ====================================================================
> running this on your test data
> I'd be curious how long it would run on your machine. I assume your
> memory is large enough to hold the line list.
Thank you for the reply.
Your script ran in:
espring> python /remote/espring/ramb/tools/lib/python2.2/profile.py script4.py /tmp/foo /tmp/foo_4
3 function calls in 36.720 CPU seconds
You may want to take a look at some of the other postings in this thread. I
suspect that this algo could also be tweaked to get into the same 16 second
range of the other scripts.
It would be impressive if there were a _pure_ python script that could
deliver the performance of the grep + sort + tail command pipe line.
-Ram
More information about the Python-list
mailing list