Python in Education Advocacy Article

Dear Python-in-Education types For Python to gain the ground it ought to in educational settings will not happen automatically. To do my part, I have decided that my first article in the Python Papers should be about comparing Python to other common first languages for learning and teaching. I'd appreciate your help. Does language even matter pedagogically? If so what are Python's advantages and disadvantages? Should there be a universal first language of computing? Should it look more or less or exactly like Python? To further discussion on this question I have set up a blog. (I hate that blogs are in reverse chronological order; I posted the articles in the opposite order than the reading order so you can read from top to bottom!) In an effort to focus the conversation, I've asked a few language comparison questions that are relevant to Python's suitability as an educational language, each in its own blog entry. Please have a look, and make a stab at short answers to the questions. (If you have a long answer, please just link to it.) My idea is not to restrict people's ideas but to make an effort to provide some structure for the conversation. It will be very interesting to see if this helps move the conversation forward. The best outcome would be if we discover "low-hanging fruit" to advance the cause. Please lend a hand. see http://pencilscience.blogspot.com/ with pythonical regards, Michael Tobis

Michael, On the sigcse list there was recently a coding challenge (http://www.cs.duke.edu/csed/code) that asked for solutions to a word frequency problem in various languages. I believe the author is planning to eventually list all the solutions received (entries came in many different languages); that could make for some interesting comparisons. I wrote one solution in Python, and one in Java, which I give below; needless to say, I found the Python version far easier to write. Toby ------------------------------------------------------------------------------------------------------------------------------- import sys if __name__ == '__main__': # get the path from the command-line fname = sys.argv[1] # read in the file as a list of tokens tokens = [tok.strip().lower() for tok in open(fname, 'r').read().split()] # calculate the frequency of each token freq = {} for tok in tokens: if tok in freq: freq[tok] += 1 else: freq[tok] = 1 # Sort the list in highest frequency to lowest frequency, # with ties sorted by lexicographics order of the words. # Uses the Python sort-decorate-unsort idiom. We sort by # the negation of the frequency values to get the proper # ordering. lst = [(-freq[tok], tok) for tok in freq] # decorate lst.sort() # sort lst = [(-freq, tok) for freq, tok in lst] # undecorate # print the results for freq, tok in lst: print '%s\t%s' % (freq, tok) ------------------------------------------------------------------------------------------------------------------------------- import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.Collections; import java.util.Scanner; import java.util.TreeMap; import java.util.TreeSet; public class Puzzle { public static void main(String[] args) throws IOException { // get the file to process String fname = args[0]; Scanner sc = new Scanner(new File(fname)); // initialize the map for the words and counts; // a TreeMap is always ordered by keys TreeMap<String, Integer> map = new TreeMap<String, Integer>(); // process the file a line at a time while (sc.hasNextLine()) { // chop each line into its constituent tokens // "\\s+" is a regular expression that matches one or more // whitespace characters String[] tokens = sc.nextLine().split("\\s+"); // make all the strings lower case, and remove any excess whitespace for (int i = 0; i < tokens.length; ++i) { tokens[i] = tokens[i].toLowerCase().trim(); } // add each token to the map for (String tok : tokens) { if (map.containsKey(tok)) { map.put(tok, map.get(tok) + 1); } else { map.put(tok, 1); } } } // remove the empty string if it is present map.remove(""); // sort the data by storing each word that occurs the same number of // times in a TreeMap of sets keyed by count; TreeSet stores its // values in sorted order TreeMap<Integer, TreeSet<String>> sortMap = new TreeMap<Integer, TreeSet<String>>(); for (String tok : map.keySet()) { int count = map.get(tok); if (sortMap.containsKey(count)) { TreeSet<String> arr = sortMap.get(count); arr.add(tok); sortMap.put(count, arr); } else { TreeSet<String> arr = new TreeSet<String>(); arr.add(tok); sortMap.put(count, arr); } } // print the data // first reverse the keys to print data in the proper order ArrayList<Integer> idx = new ArrayList<Integer>(); idx.addAll(sortMap.keySet()); Collections.reverse(idx); // print it to stdout for (Integer key : idx) { TreeSet<String> toks = sortMap.get(key); for (String t : toks) { System.out.printf("%s\t%s\n", key, t); } } } }

<grin> Seven lines seems reasonable to me: ########## import sys concord = {} for word in [token.lower() for token in open(sys.argv [1],"r").read().split()]: concord[word] = concord.get(word,0) + 1 result = sorted([(item[1],item[0]) for item in concord.items ()],reverse=True) for pair in result: print "%s\t%s" % pair ########### While I wonder if the problem wasn't set up to be in Python's favor, this sort of thing is a big plus for Python in daily use. Terseness, though, is not enough in general and certainly not in education. Perl makes a virtue of terseness, and I have a strong sense it is not very useful as a first language. mt On 3/26/07, Toby Donaldson <tjd@sfu.ca> wrote:
Michael,
On the sigcse list there was recently a coding challenge (http://www.cs.duke.edu/csed/code) that asked for solutions to a word frequency problem in various languages. I believe the author is planning to eventually list all the solutions received (entries came in many different languages); that could make for some interesting comparisons.
I wrote one solution in Python, and one in Java, which I give below; needless to say, I found the Python version far easier to write.
Toby
------------------------------------------------------------------------------------------------------------------------------- import sys
if __name__ == '__main__': # get the path from the command-line fname = sys.argv[1]
# read in the file as a list of tokens tokens = [tok.strip().lower() for tok in open(fname, 'r').read().split()]
# calculate the frequency of each token freq = {} for tok in tokens: if tok in freq: freq[tok] += 1 else: freq[tok] = 1
# Sort the list in highest frequency to lowest frequency, # with ties sorted by lexicographics order of the words. # Uses the Python sort-decorate-unsort idiom. We sort by # the negation of the frequency values to get the proper # ordering.
lst = [(-freq[tok], tok) for tok in freq] # decorate lst.sort() # sort lst = [(-freq, tok) for freq, tok in lst] # undecorate
# print the results for freq, tok in lst: print '%s\t%s' % (freq, tok)
-------------------------------------------------------------------------------------------------------------------------------
import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.Collections; import java.util.Scanner; import java.util.TreeMap; import java.util.TreeSet;
public class Puzzle {
public static void main(String[] args) throws IOException { // get the file to process String fname = args[0];
Scanner sc = new Scanner(new File(fname));
// initialize the map for the words and counts; // a TreeMap is always ordered by keys TreeMap<String, Integer> map = new TreeMap<String, Integer>();
// process the file a line at a time while (sc.hasNextLine()) { // chop each line into its constituent tokens // "\\s+" is a regular expression that matches one or more // whitespace characters String[] tokens = sc.nextLine().split("\\s+");
// make all the strings lower case, and remove any excess whitespace for (int i = 0; i < tokens.length; ++i) { tokens[i] = tokens[i].toLowerCase().trim(); }
// add each token to the map for (String tok : tokens) { if (map.containsKey(tok)) { map.put(tok, map.get(tok) + 1); } else { map.put(tok, 1); } } }
// remove the empty string if it is present map.remove("");
// sort the data by storing each word that occurs the same number of // times in a TreeMap of sets keyed by count; TreeSet stores its // values in sorted order TreeMap<Integer, TreeSet<String>> sortMap = new TreeMap<Integer, TreeSet<String>>(); for (String tok : map.keySet()) { int count = map.get(tok); if (sortMap.containsKey(count)) { TreeSet<String> arr = sortMap.get(count); arr.add(tok); sortMap.put(count, arr); } else { TreeSet<String> arr = new TreeSet<String>(); arr.add(tok); sortMap.put(count, arr); } }
// print the data
// first reverse the keys to print data in the proper order ArrayList<Integer> idx = new ArrayList<Integer>(); idx.addAll(sortMap.keySet()); Collections.reverse(idx);
// print it to stdout for (Integer key : idx) { TreeSet<String> toks = sortMap.get(key); for (String t : toks) { System.out.printf("%s\t%s\n", key, t); } } } } _______________________________________________ Edu-sig mailing list Edu-sig@python.org http://mail.python.org/mailman/listinfo/edu-sig

Actually, it appears your code makes a common error: according to the problem specification, when frequencies are tied, they should be ordered alphabetically. But your code orders tied frequencies in reverse, e.g. 7 which # oops: wrong order 7 this 7 there 7 one 7 me 7 by 7 about 7 ``the Toby On 3/26/07, Michael Tobis <mtobis@gmail.com> wrote:
<grin>
Seven lines seems reasonable to me:
########## import sys
concord = {} for word in [token.lower() for token in open(sys.argv[1],"r").read().split()]: concord[word] = concord.get(word,0) + 1 result = sorted([(item[1],item[0]) for item in concord.items()],reverse=True) for pair in result: print "%s\t%s" % pair ###########
While I wonder if the problem wasn't set up to be in Python's favor, this sort of thing is a big plus for Python in daily use.
Terseness, though, is not enough in general and certainly not in education. Perl makes a virtue of terseness, and I have a strong sense it is not very useful as a first language.
mt
On 3/26/07, Toby Donaldson <tjd@sfu.ca> wrote:
Michael,
On the sigcse list there was recently a coding challenge (http://www.cs.duke.edu/csed/code) that asked for
solutions to a word
frequency problem in various languages. I believe the author is planning to eventually list all the solutions received (entries came in many different languages); that could make for some interesting comparisons.
I wrote one solution in Python, and one in Java, which I give below; needless to say, I found the Python version far easier to write.
Toby
-------------------------------------------------------------------------------------------------------------------------------
import sys
if __name__ == '__main__': # get the path from the command-line fname = sys.argv[1]
# read in the file as a list of tokens tokens = [tok.strip().lower() for tok in open(fname, 'r').read().split()]
# calculate the frequency of each token freq = {} for tok in tokens: if tok in freq: freq[tok] += 1 else: freq[tok] = 1
# Sort the list in highest frequency to lowest frequency, # with ties sorted by lexicographics order of the words. # Uses the Python sort-decorate-unsort idiom. We sort by # the negation of the frequency values to get the proper # ordering.
lst = [(-freq[tok], tok) for tok in freq] # decorate lst.sort() # sort lst = [(-freq, tok) for freq, tok in lst] # undecorate
# print the results for freq, tok in lst: print '%s\t%s' % (freq, tok)
-------------------------------------------------------------------------------------------------------------------------------
import java.io.File; import java.io.IOException; import java.util.ArrayList ; import java.util.Collections; import java.util.Scanner; import java.util.TreeMap; import java.util.TreeSet;
public class Puzzle {
public static void main(String[] args) throws IOException { // get the file to process String fname = args[0];
Scanner sc = new Scanner(new File(fname));
// initialize the map for the words and counts; // a TreeMap is always ordered by keys TreeMap<String, Integer> map = new TreeMap<String,
Integer>();
// process the file a line at a time while ( sc.hasNextLine()) { // chop each line into its constituent tokens // "\\s+" is a regular expression that matches one or more // whitespace characters String[] tokens = sc.nextLine().split("\\s+");
// make all the strings lower case, and remove any excess whitespace for (int i = 0; i < tokens.length; ++i) { tokens[i] = tokens[i].toLowerCase().trim(); }
// add each token to the map for (String tok : tokens) { if (map.containsKey(tok)) { map.put(tok,
map.get(tok) + 1);
} else { map.put(tok, 1); } } }
// remove the empty string if it is present map.remove("");
// sort the data by storing each word that occurs the same number of // times in a TreeMap of sets keyed by count; TreeSet
stores its
// values in sorted order TreeMap<Integer, TreeSet<String>> sortMap = new
TreeMap<Integer,
TreeSet<String>>(); for (String tok : map.keySet()) { int count = map.get(tok); if (sortMap.containsKey(count)) { TreeSet<String> arr = sortMap.get(count); arr.add(tok); sortMap.put(count, arr); } else { TreeSet<String> arr = new TreeSet<String>(); arr.add(tok); sortMap.put(count, arr); } }
// print the data
// first reverse the keys to print data in the proper order ArrayList<Integer> idx = new ArrayList<Integer>(); idx.addAll(sortMap.keySet()); Collections.reverse(idx);
// print it to stdout for (Integer key : idx) { TreeSet<String> toks = sortMap.get(key); for (String t : toks) { System.out.printf("%s\t%s\n", key, t); } } } } _______________________________________________ Edu-sig mailing list Edu-sig@python.org http://mail.python.org/mailman/listinfo/edu-sig
-- Dr. Toby Donaldson School of Computing Science Simon Fraser University (Surrey)

Oof. (Thanks.) :-} Proving once again that eyeballing the test is not running the test! As penance I have reduced it from seven lines to six. This one is actually tested. ######## import sys concord = {} for word in [token.lower() for token in open(sys.argv [1],"r").read().split()]: concord[word] = concord.get(word,0) + 1 result = sorted(concord.items(), lambda x,y: cmp(-x[1],-y[1]) or cmp(x[0],y[0]) ) print "\n".join( ["%s\t%s" % pair[-1::-1] for pair in result] ) # mt ######## On 3/26/07, Toby Donaldson <tjd@sfu.ca> wrote:
Actually, it appears your code makes a common error: according to the problem specification, when frequencies are tied, they should be ordered alphabetically.

Michael Tobis wrote:
As penance I have reduced it from seven lines to six. This one is actually tested.
You know, when confronted with this type of little, one-off problems before, I used to write Perl code like this: for (1..5){@d = sort split //,($_+(chomp($_=<STDIN>))*0); $t=(@d%2)?($d[int(@d/2)]):($d[@d/2-1]); s/$t/($t<3)?$d[-1]:(($t<6)?$d[0]:(($t<9)?(sub{$s+=$_for(@d);$s%10}->()):0))/e; print $_+0;} So when I see code that reads about as easily on a Python list, I think it might be time to tell people to step back, take a deep breath, and remember there's a reason they're using Python -- and it's not reducing the LOC count ;) -- Ivan Krstić <krstic@solarsail.hcs.harvard.edu> | GPG: 0x147C722D

On 3/27/07, Ivan Krstić <krstic@solarsail.hcs.harvard.edu> wrote:
Michael Tobis wrote:
So when I see code that reads about as easily on a Python list, I think it might be time to tell people to step back, take a deep breath, and remember there's a reason they're using Python -- and it's not reducing the LOC count ;)
While I think that is a bit harsh on my little hack, your point is well taken. (And let me say that it is an honor and a pleasure to hear from you!) However it's a little frustrating; you say "there's a reason" but you leave me to guess what you think that reason might be. I am looking for is people actually *articulating* what they like (and dislike!) about Python, especially in an educational context. (Am I asking people to do my homework for me? Well, yeah, sure. I am not claiming to be the smartest person who has an interest in this topic, and I am certainly not the most experienced, but I still want to produce an article that moves the field forward a bit. The more help I can get, the better.) I see that: "Ivan is a strong advocate of open source software and software libre. He thinks Python may well be the greatest thing since sliced bread." (http://blogs.law.harvard.edu/ivan/) I agree about the sliced bread thing. I'd love to know why that is, though, if you can spare a few minutes to try to articulate it. mt

To further discussion on this question I have set up a blog. (I hate that blogs are in reverse chronological order; I posted the articles in the opposite order than the reading order so you can read from top to bottom!)
I'd rather post thoughts to edu-sig if you don't mind, as your questions are germane and in line with questions we've been asking ourselves on this list. Feel free to add a link from your blog (the edu-sig archive is public and open) as I might from mine (I link to edu-sig fairly routinely).
From my point of view, as someone who teaches Python both pre- and post- college, thinking in terms of objects is somewhat intuitive (why OO is appealing in the first place) and is anchoring paradigm I strive to impart, with Python the modeling language.
Within OO, students have many choices of other language *and* have a handle on what a "programming language paradigm" means, so when encountering a non-OO language, students will we hope recognize how that's not just a superficial/semantic thing, but a whole "way of looking" thing (Wittgenstein) involving "gestalts". That being said, we encourage forays into non-OO languages (I'm not a religious fanatic, believe in wandering and exploring). OK, so that's to put things at a meta "teacher training" level. In practical reality, this approach entails putting a lot of stress on "dot notation" because that's what they'll find in C#, C++, JavaScript, Java and others, or some other symbol playing pretty much the exact same role. Interact with the primitive of native objects Python gives you, learn dot notation interactively in the shell as a "user". Then, after an interlude with functions (including functions that eat and return functions), move to rolling your own class definitions, using __rib__ syntax to savor what it means to be "on the inside" in the object creation business. [1] Now we've talked before on this list how some CS0 types don't want to dive into OO until maybe CS1, and why Python is better than other languages is it *doesn't* force an OO way of looking, can be used by those more into other paradigms or no paradigm at all. Be that as it may, and resisting the urge to defend my approach, I'll just say that in *my* curriculum (not necessarily CS), which I'm in the process of propagating more widely, via screencasts especially, it's OO that's really a starting point (because of the "math objects" we use -- vectors, polynomials, fractals and so forth). Python takes precedence *because* it provides such a clear, uncluttered view of that paradigm. Anyway, that's just *my* little success story (my work is bearing lots of fruit), YMMV.[2] Kirby http://www.4dsolutions.net/ocn/cp4e.html [1] http://controlroom.blogspot.com/search?q=python [2] YMMV = your mileage may vary

Thanks. Kirby's is an interesting response with which I basically agree. Those of us who learned programming when the model was close to the machine model (Fortran, C) have had a hard time wrapping our heads around objects. A good friend whom I very much admire has argued (this was per-Python so I am not sure whether he stands by this) that OOP is for experts and shouldn't be foisted on beginners at all. I have seen the consequences of premature stress on OOP; supposedly professional programmers writing gobs of Fortran-like code in Java; great hulking monstrosities of procedural code wrapped in "public static void main" for bureaucratic reasons, to convince javac to allow the fortranlike code through. The blame goes to the instructor who did not manage to understand what advantages the object model brings to the table at all. The resulting Java code provides the worst of both worlds: all the overhead of OOP and Java with none of the advantages. Python's gentle introduction to objects makes the distinction between procedural and OO approaches visible gradually. You don't have to **explain** the concept, which is very difficult lacking the experience. The usual examples of a car is-a vehicle and a car has-a motor seem utterly pointless, and yet abstractions which actually add substantial value are not easily seen by people who haven't felt the need for them. With Python you aren't left in a position of trying to explain the benefits of OOP after people have developed non-objecty habits and ways of thinking, nor before they have a need for the abstractions that you enforce. The veil is lifted gradually, at the right time and in the right measure. This is a remarkable feature. It strikes me that this ties in with something Ian Bicking once blogged: "This is often how I feel about object-oriented programming, and somewhere where I think Python's imperative approach is a real feature. Starting on an application by building objects (or doing "whole design") is a bad idea, unless you know the domain well. It's better to get it doing what you need to do, and think about objects along the way. Somewhere you'll notice you keep passing around the same value(s) to a set of functions -- those set of values are "self", and those functions are methods. You'll find ways to split things into modules, etc. Designing objects and modules too early and you'll have lots of circular dependencies and poor metaphors. Of course, you can fix all those, but it's easier to add structure to a naive project than to reshape and restructure a misdeveloped project." The similarity between these arguments is striking, even though they are aimed at very different groups of code developers. Which leads to another argument in Python's favor; the continuity of approach between very small and very large projects, and between very simple and very complex goals. The fact that we can offer a language to beginners which has some traction as vocational training is something of a mixed blessing if we think computer programming is "for everybody", but the fact that Python is not a "training wheel" language. Without compromise, itt supports the entire range of activity from casual dabbling all the way to the most sophisticated abstractions. This, I think, is unique. mt
participants (4)
-
Ivan Krstić
-
kirby urner
-
Michael Tobis
-
Toby Donaldson