[Tutor] Help please
Pinedo, Ruben A
rapinedo at miners.utep.edu
Wed Oct 16 20:49:01 CEST 2013
I was given this code and I need to modify it so that it will:
#1. Error handling for the files to ensure reading only .txt file
#2. Print a range of top words... ex: print top 10-20 words
#3. Print only the words with > 3 characters
#4. Modify the printing function to print top 1 or 2 or 3 ....
#5. How many unique words are there in the book of length 1, 2, 3 etc
I am fairly new to python and am completely lost, i looked in my book as to how to do number one but i cannot figure out what to modify and/or delete to add the print selection. This is the code:
import string
def process_file(filename):
hist = dict()
fp = open(filename)
for line in fp:
process_line(line, hist)
return hist
def process_line(line, hist):
line = line.replace('-', ' ')
for word in line.split():
word = word.strip(string.punctuation + string.whitespace)
word = word.lower()
hist[word] = hist.get(word, 0) + 1
def common_words(hist):
t = []
for key, value in hist.items():
t.append((value, key))
t.sort(reverse=True)
return t
def most_common_words(hist, num=100):
t = common_words(hist)
print 'The most common words are:'
for freq, word in t[:num]:
print freq, '\t', word
hist = process_file('emma.txt')
print 'Total num of Words:', sum(hist.values())
print 'Total num of Unique Words:', len(hist)
most_common_words(hist, 50)
Any help would be greatly appreciated because i am struggling in this class. Thank you in advance
Respectfully,
Ruben Pinedo
Computer Information Systems
College of Business Administration
University of Texas at El Paso
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20131016/de97f06f/attachment.html>
More information about the Tutor
mailing list