[Tutor] Finding the shortest word in a list of words

Emad Nawfal (عماد نوفل) emadnawfal at gmail.com
Tue Jan 20 17:12:18 CET 2009

On Tue, Jan 20, 2009 at 10:33 AM, Paul McGuire <ptmcg at austin.rr.com> wrote:

> "Finding the shortest word among a list of words" sounds like something of
> a
> trick question to me.  I think a more complete problem statement would be
> "Find the list of words that are the shortest", since there is no guarantee
> that the list does not contain two words of the same shortest length.  If
> you just add "me" to your sample set, you now get misleading answers:
> words = "man woman children he me".split()
> print min(words, key=len)
> prints:
> he
> What happened to "me"?  It is just as short as "he"!
> To get *all* the words that are the shortest length, I'll show two
> approaches.  The first uses a defaultdict, from the collections module of
> the Python stdlib.  With a defaultdict, I can have new dict entries get
> initialized with a default value using a specified factory function,
> without
> first having to check to see if that entry exists.  I would like to create
> a
> dict of lists of words, keyed by the lengths of the words.  Something that
> would give me this dict:
> { 2 : ['he', 'me'], 3 : ['man'], ... etc. }
> from which I could then find the minimum length using
> min(wordlendict.keys()).  By using a defaultdict, I can just build things
> up
> by iterating over the list and adding each word to the entry for the
> current
> word's length - if the current word is the first one for this length to be
> found, the defaultdict will initialize the value to an empty list for me.
> This allows me to safely append the current word regardless of whether
> there
> was an existing entry or not.  Here is the code that does this:
> from collections import defaultdict
> wordlendict = defaultdict(list)
> for w in words:
>    wordlendict[len(w)].append(w)
> minlen = min(wordlendict.keys())
> minlist = wordlendict[minlen]
> print minlist
> prints:
> ['he', 'me']
> Now we are getting a more complete answer!
> A second approach uses the groupby method of the itertools module in the
> stdlib.  groupby usually takes me several attempts to get everything right:
> the input must be sorted by the grouping feature, and then the results
> returned by groupby are in the form of an iterator that returns
> key-iterator
> tuples.  I need to go through some mental gymnastics to unwind the data to
> get the group I'm really interested in.  Here is a step-by-step code to use
> groupby:
> from itertools import groupby
> grpsbylen = groupby(sorted(words,key=len),len)
> mingrp = grpsbylen.next()
> minlen = mingrp[0]
> minlist = list(mingrp[1])
> print minlist
> The input list of words gets sorted, and then passed to groupby, which uses
> len as the grouping criteria (groupby is not inherently aware of how the
> input list was sorted, so len must be repeated).  The first group tuple is
> pulled from the grpsbylen iterator using next().  The 0'th tuple element is
> the grouping value - in this case, it is the minimum length 2.  The 1'th
> tuple element is the group itself, given as an iterator.  Passing this to
> the list constructor gives us the list of all the 2-character words, which
> then gets printed:
> ['he', 'me']
> Again, 'me' no longer gets left out.
> Maybe this will get you some extra credit...
> -- Paul

Thank you so much Paul for this. In my original post, I wrote word(s), which
means word or words. Your solutions look  a little bit too advanced to me. I
never used the collections module, and used itertools only once or twice. I
will study these solutions for sure, not for extra credit, simply because
I'm simply a linguistics person who uses corpora to find information without
any official interest in computer science or programming classes. Maybe when
I'm good enough at programming, I will take a Python class for credit,
although I'm past the classes thing now.
Thank you all again for helping me get a better understanding of Python.

> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor

لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد
"No victim has ever been more repressed and alienated than the truth"

Emad Soliman Nawfal
Indiana University, Bloomington

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090120/f437a33c/attachment-0001.htm>

More information about the Tutor mailing list