[Tutor] Most common words in a text file
Sri G.
srigalibe at gmail.com
Sat Sep 30 13:12:46 EDT 2017
I'm learning programming with Python.
I’ve written the code below for finding the most common words in a text
file that has about 1.1 million words. It's working fine, but I believe
there is always room for improvement.
When run, the function in the script gets a text file from the command-line
argument sys.argv[1], opens the file in read mode, converts the text to
lowercase, makes a list of words from the text after removing any
whitespaces or empty strings, and stores the list elements as dictionary
keys and values in a collections.Counter object. Finally, it returns a
dictionary of the most common words and their counts. The
words.most_common() method gets its argument from the optional top
parameter.
import sysimport collections
def find_most_common_words(textfile, top=10):
''' Returns the most common words in the textfile.'''
textfile = open(textfile)
text = textfile.read().lower()
textfile.close()
words = collections.Counter(text.split()) # how often each word appears
return dict(words.most_common(top))
filename = sys.argv[1]
top_five_words = find_most_common_words(filename, 5)
I need your comments please.
Sri
More information about the Tutor
mailing list