[Tutor] Most common words in a text file

Alan Gauld alan.gauld at yahoo.co.uk
Sun Oct 1 07:31:04 EDT 2017


On 30/09/17 18:12, Sri G. wrote:

> import sysimport collections

I assume that should be two lines?

But you can also import multiple modules on a single line.

import sys, collections

Although some folks don't like that style.

> def find_most_common_words(textfile, top=10):
>     ''' Returns the most common words in the textfile.'''

The comment is slightly inaccurate since you really
return a dict of the most common words *with the counts* added.
It is good practice to specify the precise return
type (list, tuple, dict etc) since that tells the user
what they can do with it once they have it.

Also by using the parameter textfile it is not clear
whether I should pass a file object or a file name.
Again it helps users if the comment is as specific
as possible.

>     textfile = open(textfile)
>     text = textfile.read().lower()

potential memory hog, others have already suggested
reading line by line

>     textfile.close()
>     words = collections.Counter(text.split()) # how often each word appears
> 
>     return dict(words.most_common(top))

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list