[Tutor] Syntax for Simplest Way to Execute One Python Program Over 1000's of Datasets

B G compbiocancerresearcher at gmail.com
Thu Jun 9 21:49:21 CEST 2011


I'm trying to analyze thousands of different cancer datasets and run the
same python program on them.  I use Windows XP, Python 2.7 and the IDLE
interpreter.  I already have the input files in a directory and I want to
learn the syntax for the quickest way to execute the program over all these
datasets.

As an example,for the sample python program below, I don't want to have to
go into the python program each time and change filename and countfile.  A
computer could do this much quicker than I ever could.  Thanks in advance!

*
*


import string

filename = 'draft1.txt'
countfile = 'draft1_output.txt'

def add_word(counts, word):
    if counts.has_key(word):
        counts[word] += 1
    else:
        counts[word] = 1

def get_word(item):
    word = ''
    item = item.strip(string.digits)
    item = item.lstrip(string.punctuation)
    item = item.rstrip(string.punctuation)
    word = item.lower()
    return word


def count_words(text):
    text = ' '.join(text.split('--')) #replace '--' with a space
    items = text.split() #leaves in leading and trailing punctuation,
                         #'--' not recognised by split() as a word separator
    counts = {}
    for item in items:
        word = get_word(item)
        if not word == '':
            add_word(counts, word)
    return counts

infile = open(filename, 'r')
text = infile.read()
infile.close()

counts = count_words(text)

outfile = open(countfile, 'w')
outfile.write("%-18s%s\n" %("Word", "Count"))
outfile.write("=======================\n")

counts_list = counts.items()
counts_list.sort()
for word in counts_list:
    outfile.write("%-18s%d\n" %(word[0], word[1]))

outfile.close
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110609/9398403a/attachment.html>


More information about the Tutor mailing list