
On Fri, Feb 27, 2009 at 5:14 AM, Gabriel Genellina <gagsl-py2@yahoo.com.ar>wrote:
I think a r.e. cannot handle this query well enough. This script uses the tokenize module and should be immune to those false positives (but it's much slower)
Thanks, Gabriel. Using the tokenize module is, indeed, much better. I think the performance problems were caused by the O(n**2) cost of reading through the directories and then removing them selectively. I modified it to have an O(n) cost and it's quite snappy. #!/usr/bin/python from __future__ import with_statement import sys, os from tokenize import generate_tokens from token import NAME def process(paths): nfiles = nwith = ntrywith = 0 for path in paths: for base, dirs, files in os.walk(path): if nfiles: print '%d "try+with" out of %d "with" (%.1f%%) in %d files (so far)\ '% (ntrywith, nwith, ntrywith*100.0/nwith if nwith else 0, nfiles) print base newdirs = [] for d in list(dirs): if d == 'CVS' or d == '_darcs' or d[0] == '.': continue newdirs.append(d) dirs[:] = newdirs for fn in files: if fn[-3:]!='.py': continue fullfn = os.path.join(base, fn) #print fn nfiles += 1 with open(fullfn) as f: try: was_try = False for toknum, tokval, _, _, _ in generate_tokens(f.readline): if toknum==NAME: is_with = tokval == 'with' if is_with: nwith += 1 if was_try: ntrywith += 1 was_try = tokval == 'try' except Exception, e: print e print '%d "try+with" out of %d "with" (%.1f%%) in %d files' % ( ntrywith, nwith, ntrywith*100.0/nwith if nwith else 0, nfiles) process(sys.argv[1:]) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>