[Tutor] Syntax for Simplest Way to Execute One Python Program Over 1000's of Datasets
B G
compbiocancerresearcher at gmail.com
Thu Jun 9 21:49:21 CEST 2011
I'm trying to analyze thousands of different cancer datasets and run the
same python program on them. I use Windows XP, Python 2.7 and the IDLE
interpreter. I already have the input files in a directory and I want to
learn the syntax for the quickest way to execute the program over all these
datasets.
As an example,for the sample python program below, I don't want to have to
go into the python program each time and change filename and countfile. A
computer could do this much quicker than I ever could. Thanks in advance!
*
*
import string
filename = 'draft1.txt'
countfile = 'draft1_output.txt'
def add_word(counts, word):
if counts.has_key(word):
counts[word] += 1
else:
counts[word] = 1
def get_word(item):
word = ''
item = item.strip(string.digits)
item = item.lstrip(string.punctuation)
item = item.rstrip(string.punctuation)
word = item.lower()
return word
def count_words(text):
text = ' '.join(text.split('--')) #replace '--' with a space
items = text.split() #leaves in leading and trailing punctuation,
#'--' not recognised by split() as a word separator
counts = {}
for item in items:
word = get_word(item)
if not word == '':
add_word(counts, word)
return counts
infile = open(filename, 'r')
text = infile.read()
infile.close()
counts = count_words(text)
outfile = open(countfile, 'w')
outfile.write("%-18s%s\n" %("Word", "Count"))
outfile.write("=======================\n")
counts_list = counts.items()
counts_list.sort()
for word in counts_list:
outfile.write("%-18s%d\n" %(word[0], word[1]))
outfile.close
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110609/9398403a/attachment.html>
More information about the Tutor
mailing list