[Tutor] Most common words in a text file
Alan Gauld
alan.gauld at yahoo.co.uk
Sun Oct 1 07:31:04 EDT 2017
On 30/09/17 18:12, Sri G. wrote:
> import sysimport collections
I assume that should be two lines?
But you can also import multiple modules on a single line.
import sys, collections
Although some folks don't like that style.
> def find_most_common_words(textfile, top=10):
> ''' Returns the most common words in the textfile.'''
The comment is slightly inaccurate since you really
return a dict of the most common words *with the counts* added.
It is good practice to specify the precise return
type (list, tuple, dict etc) since that tells the user
what they can do with it once they have it.
Also by using the parameter textfile it is not clear
whether I should pass a file object or a file name.
Again it helps users if the comment is as specific
as possible.
> textfile = open(textfile)
> text = textfile.read().lower()
potential memory hog, others have already suggested
reading line by line
> textfile.close()
> words = collections.Counter(text.split()) # how often each word appears
>
> return dict(words.most_common(top))
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos
More information about the Tutor
mailing list