[Edu-sig] [edupython] Python in Education Advocacy Article

Michael Tobis mtobis at gmail.com
Tue Mar 27 04:52:31 CEST 2007


<grin>

Seven lines seems reasonable to me:

##########
import sys

concord = {}
for word in [token.lower() for token in open(sys.argv
[1],"r").read().split()]:
   concord[word] = concord.get(word,0) + 1
result = sorted([(item[1],item[0]) for item in concord.items
()],reverse=True)
for pair in result:
   print "%s\t%s" % pair
###########

While I wonder if the problem wasn't set up to be in Python's favor, this
sort of thing is a big plus for Python in daily use.

Terseness, though, is not enough in general and certainly not in education.
Perl makes a virtue of terseness, and I have a strong sense it is not very
useful as a first language.

mt

On 3/26/07, Toby Donaldson <tjd at sfu.ca> wrote:
>
> Michael,
>
> On the sigcse list there was recently a coding challenge
> (http://www.cs.duke.edu/csed/code) that asked for solutions to a word
> frequency problem in various languages. I believe the author is
> planning to eventually list all the solutions received (entries came
> in many different languages); that could make for some interesting
> comparisons.
>
> I wrote one solution in Python, and one in Java, which I give below;
> needless to say, I found the Python version far easier to write.
>
> Toby
>
>
> -------------------------------------------------------------------------------------------------------------------------------
> import sys
>
> if __name__ == '__main__':
>    # get the path from the command-line
>    fname = sys.argv[1]
>
>    # read in the file as a list of tokens
>    tokens = [tok.strip().lower() for tok in open(fname,
> 'r').read().split()]
>
>    # calculate the frequency of each token
>    freq = {}
>    for tok in tokens:
>        if tok in freq:
>            freq[tok] += 1
>        else:
>            freq[tok] = 1
>
>    # Sort the list in highest frequency to lowest frequency,
>    # with ties sorted by lexicographics order of the words.
>    # Uses the Python sort-decorate-unsort idiom. We sort by
>    # the negation of the frequency values to get the proper
>    # ordering.
>
>    lst = [(-freq[tok], tok) for tok in freq]  # decorate
>    lst.sort()                                 # sort
>    lst = [(-freq, tok) for freq, tok in lst]  # undecorate
>
>    # print the results
>    for freq, tok in lst:
>        print '%s\t%s' % (freq, tok)
>
>
> -------------------------------------------------------------------------------------------------------------------------------
>
> import java.io.File;
> import java.io.IOException;
> import java.util.ArrayList;
> import java.util.Collections;
> import java.util.Scanner;
> import java.util.TreeMap;
> import java.util.TreeSet;
>
> public class Puzzle {
>
>        public static void main(String[] args) throws IOException {
>                // get the file to process
>                String fname = args[0];
>
>                Scanner sc = new Scanner(new File(fname));
>
>                // initialize the map for the words and counts;
>                // a TreeMap is always ordered by keys
>                TreeMap<String, Integer> map = new TreeMap<String,
> Integer>();
>
>                // process the file a line at a time
>                while (sc.hasNextLine()) {
>                        // chop each line into its constituent tokens
>                        // "\\s+" is a regular expression that matches
> one or more
>                        // whitespace characters
>                        String[] tokens = sc.nextLine().split("\\s+");
>
>                        // make all the strings lower case, and remove
> any excess whitespace
>                        for (int i = 0; i < tokens.length; ++i) {
>                                tokens[i] = tokens[i].toLowerCase().trim();
>                        }
>
>                        // add each token to the map
>                        for (String tok : tokens) {
>                                if (map.containsKey(tok)) {
>                                        map.put(tok, map.get(tok) + 1);
>                                } else {
>                                        map.put(tok, 1);
>                                }
>                        }
>                }
>
>                // remove the empty string if it is present
>                map.remove("");
>
>                // sort the data by storing each word that occurs the
> same number of
>                // times in a TreeMap of sets keyed by count; TreeSet
> stores its
>                // values in sorted order
>                TreeMap<Integer, TreeSet<String>> sortMap = new
> TreeMap<Integer,
> TreeSet<String>>();
>                for (String tok : map.keySet()) {
>                        int count = map.get(tok);
>                        if (sortMap.containsKey(count)) {
>                                TreeSet<String> arr = sortMap.get(count);
>                                arr.add(tok);
>                                sortMap.put(count, arr);
>                        } else {
>                                TreeSet<String> arr = new
> TreeSet<String>();
>                                arr.add(tok);
>                                sortMap.put(count, arr);
>                        }
>                }
>
>                // print the data
>
>                // first reverse the keys to print data in the proper order
>                ArrayList<Integer> idx = new ArrayList<Integer>();
>                idx.addAll(sortMap.keySet());
>                Collections.reverse(idx);
>
>                // print it to stdout
>                for (Integer key : idx) {
>                        TreeSet<String> toks = sortMap.get(key);
>                        for (String t : toks) {
>                                System.out.printf("%s\t%s\n", key, t);
>                        }
>                }
>        }
> }
> _______________________________________________
> Edu-sig mailing list
> Edu-sig at python.org
> http://mail.python.org/mailman/listinfo/edu-sig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/edu-sig/attachments/20070326/88aa5dd2/attachment.htm 


More information about the Edu-sig mailing list