Tutorial: Brief Introduction to the Standard Libary

I'm adding section to the tutorial with a brief sampling of library offerings and some short examples of how to use them. My first draft included: copy, glob, shelve, pickle, os, re, math/cmath, urllib, smtplib Guido's thoughts: - copy tends to be overused by beginners - the shelve module has pitfalls for new users - cmath is rarely needed and some folks are scared of complex numbers - urllib2 is be a better choice than urllib I'm interested to know what your experiences have been with teaching python. Which modules are necessary to start doing real work (like pickle and os), which are most easily grasped (like glob or random), which have impressive examples only a few lines long (i.e. urllib), and which might just be fun (turtle would be a candidate if it didn't have a Tk dependency). Note, re was included because everyone should know it's there and everyone should get advice to not use it when string methods will suffice. I'm especially interested in thoughts on whether shelve should be included. When I first started out, I was very impressed with shelves because they were the simplest way to add a form of persistence and because they could be dropped in place of a dictionary in scripts that were already built. Also, it was trivially easy to learn based on existing knowledge of dictionaries. OTOH, that existing knowledge is what makes the pitfalls so surprising. Likewise, I was impressed with the substitutability of line lists, text splits, file.readlines(), and urlopen(). While I think of copy() and deepcopy() as builtins that got tucked away in module, Guido is right about their rarity in well-crafted code. Some other candidates (let's pick just a two or three): - csv (basic tool for sharing data with other applications) - datetime (comes up frequently in real apps and admin tasks) - ftplib (because the examples are so brief) - getopt or optparse (because the task is common) - operator (because otherwise, the functionals can be a PITA) - pprint (because beauty counts) - struct (because fixed record layouts are common) - threading/Queue (because without direction people grab thread and mutexes) - timeit (because it answers most performance questions in a jiffy) - unittest (because TDD folks like myself live by it) I've avoided XML because it is a can of worms and because short examples don't do it justice. OTOH, it *is* the hot topic of the day and seems to be taking over the world one angle bracket at a time. Ideally, the new section should be relatively short but leave a reader with a reasonable foundation for crafting non-toy scripts. A secondary goal is to show-off the included batteries -- I think it is common for someone to download several languages and choose between them based on their tutorial experiences (so, a little flash and sizzle might be warranted). Raymond

Raymond Hettinger writes:
I'm adding section to the tutorial with a brief sampling of library offerings and some short examples of how to use them.
Cool!
Actually, they usually travel in pairs. ;-) I would stay away from XML for this; there's too much there and how to pick one thing over another isn't always obvious even when someone explains it. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation

On Wednesday 26 November 2003 09:56 pm, Raymond Hettinger wrote:
I'm adding section to the tutorial with a brief sampling of library offerings and some short examples of how to use them.
Great idea.
I would add: sys -- "real programs" want to access their command-line arguments (sys.argv), want to terminate (sys.exit), want to write to sys.stderr. fileinput -- users are VERY likely to want to "rewrite textfiles in-place" (as well as wanting to read a bunch of textfiles) and fileinput is just the ticket for that. Users coming from perl particularly need fileinput desperately as it affords close translation of the "while(<>)" idiom. cStringIO -- I've noticed most newbies find it more natural to "write to a cStringIO.StringIO pseudofile as they go" then getvalue, rather than append'ing to a list of strings then ''.join . time, datetime, calendar -- many real programs want to deal with dates and times array -- many newbies try to use lists to do things that are perfect for array.array's
pickle and os), which are most easily grasped (like glob or random), which have impressive examples only a few lines long (i.e. urllib), and
I think zipfile and gzip are easily grasped AND impressive for people who've ever needed to read/write compressed files in other languages. xmlrpclib and SimpleXMLRPCServer are also eye-poppers (and despite their names you don't need to get into XML at all to show them off:-). CGIHTTPServer, while of course not all that suitable for "real programs", has also contributed more than its share in making instant converts to Python, in my experience -- "instant gratification".
Hmmm, yes, but, with writeback=True, you do work around the most surprising pitfalls (at a price in performance, of course). I dunno -- with so many other impressive modules to show off, maybe shelve might be avoided.
- threading/Queue (because without direction people grab thread and mutexes)
True, they do. But I don't know if the tutorial is the right time to indoctrinate people about proper Python threading architectures.
- timeit (because it answers most performance questions in a jiffy) - unittest (because TDD folks like myself live by it)
Absolute agreement here. And doctest is SO easy to use, that for the limited space of the tutorial it might also be quite appropriate -- it also encourages abundant use of docstrings, a neat thing in itself. Alex

[Alex Martelli]
I do not doubt that cStringIO is useful to know, and a tutorial could throw a short glimpse here about why the `c' prefix and speed issues. For a newcomer, here might be a good opportunity for illustrating one surprising capability of Python for those coming from other languages, which is using bound methods as "first-class" objects. Like: fragments = [] write = fragments.append ... <code using `write' above> ... result = ''.join(fragments) I think this approach is not much more difficult than `StringIO', not so bad efficiency-wise, but likely more fruitful about developing Python useful understanding and abilities. A tutorial might also show that the said `write' could be given and received in functions, which do not have to "know" if they are writing to a file, or in-memory fragments. -- François Pinard http://www.iro.umontreal.ca/~pinard

[I'm Gerrit Holl (18) and I've been using Python for 3-4 years] Alex Martelli wrote:
time, datetime, calendar -- many real programs want to deal with dates and times
In my opinion, we should not include all three in the tutorial. I think only datetime should be included. datetime has largely the same niche as time, with the difference that datetime is object oriented and time is not. In my opinion, this makes datetime superior to time. Further, I think calender isn't used a lot... calendar, format3c, format3cstring, month, monthcalendar, prcal, prmonth, prweek, week, weekheader Those mostly copy the unix cal utility. They probably can be useful, but I'm not sure when. Don't most GUI's provide tools for selecting a date from a window? isleap, leapdays Useful functions. Never used them, though. firstweekday, setfirstweekday Don't really know when/why to use them timegm Doesn't belong here I think the calendar module does not contain enough functionality in order to justify it to be included in the tutorial. I think datetime does belong in the tutorial, while time and calendar do not. yours, Gerrit. -- 242. If any one hire oxen for a year, he shall pay four gur of corn for plow-oxen. -- 1780 BC, Hammurabi, Code of Law -- Asperger's Syndrome - a personal approach: http://people.nl.linux.org/~gerrit/english/

Thank you everyone for the ideas on what to include and exclude from the new tutorial section. Attached is a revised draft. Comments, suggestions, nitpicks, complaints, and accolades are welcome. Raymond Hettinger

This version looks great! --Guido van Rossum (home page: http://www.python.org/~guido/)

Raymond Hettinger wrote:
I'm adding section to the tutorial with a brief sampling of library offerings and some short examples of how to use them.
I think it's a great idea.
My first draft included: copy, glob, shelve, pickle, os, re, math/cmath, urllib, smtplib
If one of those is chosen, I'd go for the latter, because it can do more and it's more OO.
Hm, not sure whether this should be in the tutorial.
- timeit (because it answers most performance questions in a jiffy) - unittest (because TDD folks like myself live by it)
- email (because it's impressive and common) - textwrap (because I love it :) and it's useful) But of course, it should stay a tutorial, and not become a reference. Users are intelligent enough to skim through the standard library looking for libraries. We should make a selection. Maybe some of them should only be pointed to, without going into detail about how to use it? yours, Gerrit. -- 135. If a man be taken prisoner in war and there be no sustenance in his house and his wife go to another house and bear children; and if later her husband return and come to his home: then this wife shall return to her husband, but the children follow their father. -- 1780 BC, Hammurabi, Code of Law -- Asperger's Syndrome - a personal approach: http://people.nl.linux.org/~gerrit/english/

On 26-nov-03, at 21:56, Raymond Hettinger wrote:
My 2 cents (and actually what I plan to do for MacPython, Some Day:-): pick a small number of tutorials where you solve toy versions of real world problems from different domains. For example you could do a "publish spreadsheet to website" where you showcase csv, and urllib, or maybe the reverse "turn html table into csv" so you can show htmllib too); "analyse some sort of logfile" where you could probably show datetime, re and maybe glob and optparse; "something scientific" could probably show cmath and random and a few others; "form mailer" could show cgi, pprint and email. I think the advantage of examples from real world problem domains is that people will pick the one that they can relate to, and hence not only will they understand what the problem is all about (i.e. people won't look at a complex number example if they haven't a clue what a complex number is), but also the functionality demonstrated should produce the "aha!" that we're after. -- Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman

Raymond> Some other candidates (let's pick just a two or three): Raymond> - csv (basic tool for sharing data with other applications) -0. I think basic usage is covered pretty well in the libref. At best, I'd just mention that it exists and link to the libref. Raymond> - datetime (comes up frequently in real apps and admin tasks) +1. Discussing how datetime and time integrate would be useful. Raymond> - ftplib (because the examples are so brief) -1. I would think it's rarely used. Raymond> - getopt or optparse (because the task is common) +1. Raymond> - operator (because otherwise, the functionals can be a PITA) -1. The most common case people needs is now covered by a builtin (sum). Raymond> - pprint (because beauty counts) +0. Brief mention at best. Raymond> - struct (because fixed record layouts are common) -0. Only for propeller heads. Raymond> - threading/Queue (because without direction people grab thread Raymond> and mutexes) +0. I agree that stumbling on thread is too common. OTOH, threads in general are a pretty advanced topic, and probably not real suitable for the tutorial. Raymond> - timeit (because it answers most performance questions in a Raymond> jiffy) -1. Newbies should probably not be worried about performance too much. In addition, I think most performance questions are deeper than those which can be answered by timeit (think naive O(n^2) algorithms). Raymond> - unittest (because TDD folks like myself live by it) +1. There's no time like the present to start adding tests. Raymond> I've avoided XML because it is a can of worms and because short Raymond> examples don't do it justice. OTOH, it *is* the hot topic of Raymond> the day and seems to be taking over the world one angle bracket Raymond> at a time. -1. XML is too complex for tutorial material. Skip

Raymond Hettinger writes:
I'm adding section to the tutorial with a brief sampling of library offerings and some short examples of how to use them.
Cool!
Actually, they usually travel in pairs. ;-) I would stay away from XML for this; there's too much there and how to pick one thing over another isn't always obvious even when someone explains it. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation

On Wednesday 26 November 2003 09:56 pm, Raymond Hettinger wrote:
I'm adding section to the tutorial with a brief sampling of library offerings and some short examples of how to use them.
Great idea.
I would add: sys -- "real programs" want to access their command-line arguments (sys.argv), want to terminate (sys.exit), want to write to sys.stderr. fileinput -- users are VERY likely to want to "rewrite textfiles in-place" (as well as wanting to read a bunch of textfiles) and fileinput is just the ticket for that. Users coming from perl particularly need fileinput desperately as it affords close translation of the "while(<>)" idiom. cStringIO -- I've noticed most newbies find it more natural to "write to a cStringIO.StringIO pseudofile as they go" then getvalue, rather than append'ing to a list of strings then ''.join . time, datetime, calendar -- many real programs want to deal with dates and times array -- many newbies try to use lists to do things that are perfect for array.array's
pickle and os), which are most easily grasped (like glob or random), which have impressive examples only a few lines long (i.e. urllib), and
I think zipfile and gzip are easily grasped AND impressive for people who've ever needed to read/write compressed files in other languages. xmlrpclib and SimpleXMLRPCServer are also eye-poppers (and despite their names you don't need to get into XML at all to show them off:-). CGIHTTPServer, while of course not all that suitable for "real programs", has also contributed more than its share in making instant converts to Python, in my experience -- "instant gratification".
Hmmm, yes, but, with writeback=True, you do work around the most surprising pitfalls (at a price in performance, of course). I dunno -- with so many other impressive modules to show off, maybe shelve might be avoided.
- threading/Queue (because without direction people grab thread and mutexes)
True, they do. But I don't know if the tutorial is the right time to indoctrinate people about proper Python threading architectures.
- timeit (because it answers most performance questions in a jiffy) - unittest (because TDD folks like myself live by it)
Absolute agreement here. And doctest is SO easy to use, that for the limited space of the tutorial it might also be quite appropriate -- it also encourages abundant use of docstrings, a neat thing in itself. Alex

[Alex Martelli]
I do not doubt that cStringIO is useful to know, and a tutorial could throw a short glimpse here about why the `c' prefix and speed issues. For a newcomer, here might be a good opportunity for illustrating one surprising capability of Python for those coming from other languages, which is using bound methods as "first-class" objects. Like: fragments = [] write = fragments.append ... <code using `write' above> ... result = ''.join(fragments) I think this approach is not much more difficult than `StringIO', not so bad efficiency-wise, but likely more fruitful about developing Python useful understanding and abilities. A tutorial might also show that the said `write' could be given and received in functions, which do not have to "know" if they are writing to a file, or in-memory fragments. -- François Pinard http://www.iro.umontreal.ca/~pinard

[I'm Gerrit Holl (18) and I've been using Python for 3-4 years] Alex Martelli wrote:
time, datetime, calendar -- many real programs want to deal with dates and times
In my opinion, we should not include all three in the tutorial. I think only datetime should be included. datetime has largely the same niche as time, with the difference that datetime is object oriented and time is not. In my opinion, this makes datetime superior to time. Further, I think calender isn't used a lot... calendar, format3c, format3cstring, month, monthcalendar, prcal, prmonth, prweek, week, weekheader Those mostly copy the unix cal utility. They probably can be useful, but I'm not sure when. Don't most GUI's provide tools for selecting a date from a window? isleap, leapdays Useful functions. Never used them, though. firstweekday, setfirstweekday Don't really know when/why to use them timegm Doesn't belong here I think the calendar module does not contain enough functionality in order to justify it to be included in the tutorial. I think datetime does belong in the tutorial, while time and calendar do not. yours, Gerrit. -- 242. If any one hire oxen for a year, he shall pay four gur of corn for plow-oxen. -- 1780 BC, Hammurabi, Code of Law -- Asperger's Syndrome - a personal approach: http://people.nl.linux.org/~gerrit/english/

Thank you everyone for the ideas on what to include and exclude from the new tutorial section. Attached is a revised draft. Comments, suggestions, nitpicks, complaints, and accolades are welcome. Raymond Hettinger

This version looks great! --Guido van Rossum (home page: http://www.python.org/~guido/)

Raymond Hettinger wrote:
I'm adding section to the tutorial with a brief sampling of library offerings and some short examples of how to use them.
I think it's a great idea.
My first draft included: copy, glob, shelve, pickle, os, re, math/cmath, urllib, smtplib
If one of those is chosen, I'd go for the latter, because it can do more and it's more OO.
Hm, not sure whether this should be in the tutorial.
- timeit (because it answers most performance questions in a jiffy) - unittest (because TDD folks like myself live by it)
- email (because it's impressive and common) - textwrap (because I love it :) and it's useful) But of course, it should stay a tutorial, and not become a reference. Users are intelligent enough to skim through the standard library looking for libraries. We should make a selection. Maybe some of them should only be pointed to, without going into detail about how to use it? yours, Gerrit. -- 135. If a man be taken prisoner in war and there be no sustenance in his house and his wife go to another house and bear children; and if later her husband return and come to his home: then this wife shall return to her husband, but the children follow their father. -- 1780 BC, Hammurabi, Code of Law -- Asperger's Syndrome - a personal approach: http://people.nl.linux.org/~gerrit/english/

On 26-nov-03, at 21:56, Raymond Hettinger wrote:
My 2 cents (and actually what I plan to do for MacPython, Some Day:-): pick a small number of tutorials where you solve toy versions of real world problems from different domains. For example you could do a "publish spreadsheet to website" where you showcase csv, and urllib, or maybe the reverse "turn html table into csv" so you can show htmllib too); "analyse some sort of logfile" where you could probably show datetime, re and maybe glob and optparse; "something scientific" could probably show cmath and random and a few others; "form mailer" could show cgi, pprint and email. I think the advantage of examples from real world problem domains is that people will pick the one that they can relate to, and hence not only will they understand what the problem is all about (i.e. people won't look at a complex number example if they haven't a clue what a complex number is), but also the functionality demonstrated should produce the "aha!" that we're after. -- Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman

Raymond> Some other candidates (let's pick just a two or three): Raymond> - csv (basic tool for sharing data with other applications) -0. I think basic usage is covered pretty well in the libref. At best, I'd just mention that it exists and link to the libref. Raymond> - datetime (comes up frequently in real apps and admin tasks) +1. Discussing how datetime and time integrate would be useful. Raymond> - ftplib (because the examples are so brief) -1. I would think it's rarely used. Raymond> - getopt or optparse (because the task is common) +1. Raymond> - operator (because otherwise, the functionals can be a PITA) -1. The most common case people needs is now covered by a builtin (sum). Raymond> - pprint (because beauty counts) +0. Brief mention at best. Raymond> - struct (because fixed record layouts are common) -0. Only for propeller heads. Raymond> - threading/Queue (because without direction people grab thread Raymond> and mutexes) +0. I agree that stumbling on thread is too common. OTOH, threads in general are a pretty advanced topic, and probably not real suitable for the tutorial. Raymond> - timeit (because it answers most performance questions in a Raymond> jiffy) -1. Newbies should probably not be worried about performance too much. In addition, I think most performance questions are deeper than those which can be answered by timeit (think naive O(n^2) algorithms). Raymond> - unittest (because TDD folks like myself live by it) +1. There's no time like the present to start adding tests. Raymond> I've avoided XML because it is a can of worms and because short Raymond> examples don't do it justice. OTOH, it *is* the hot topic of Raymond> the day and seems to be taking over the world one angle bracket Raymond> at a time. -1. XML is too complex for tutorial material. Skip
participants (9)
-
Alex Martelli
-
François Pinard
-
Fred L. Drake, Jr.
-
Gerrit Holl
-
Guido van Rossum
-
Jack Jansen
-
Raymond Hettinger
-
Raymond Hettinger
-
Skip Montanaro