[Edu-sig] modules "versus" programs

Tue Apr 20 04:43:22 CEST 2010

I'm sensing some confusion and/or a mish-mash of ideas around
the idea of a Python script, module, library, program.  In Python,
these somewhat amount to the same thing in that to load a module
is to run a script (even if it doesn't "do" anything).  However, when
people go to write a program, they're often not thinking how the
very same module might also serve in a library capacity.

Is this because of inherited thinking?  In the good old days, a
library was like (is like) a .dll or .so, i.e. a dynamic link library
or shared object.  Programs would do function calls to these
these libraries, which would not be designed to work as
"standalone" programs.  This is how our operating systems
work, commercial code of all kinds.  We do our work atop
mountains of "dependencies" hence "dll hell" and the need
for package managers like Synaptic.

What people discover about Python is the power of the interactive
shell, not just for doodling but for getting some serious work done.
They discover it's possible to import utilities, 3rd party code, and
work with it directly.  There's less of a need to i/o to disk or pipe
stuff from one utility to the next when you're in shell mode, as
here you have your persistent data objects as a part of your
session.

Those of us coming from APL and LOGO are more used to the
idea of a "session", where you interact with the interpreter while
building up a rich environment of session materials.  Smalltalk
also has this idea of a persistent image.  When you come back
to the session, it's more like restoring from hibernation.

A Python module may also be simply for data storage.  If you've
gone to all the trouble to parse in some file, turning all the strings
into numbers, and put those numbers into Python lists, other
data structures, then why go back to that same file to start with?
Take your native Python data and store that as a module, maybe
thousands of lines long.  You could also serialize it as a pickle,
but maybe keeping it human readable is what's important.

Python data structures are at least as readable as XML a lot
of the time.  We should have more .py files that are actually just
data files.  But then they could have a thin layer of functions
and objects that make manipulation of that data easier.

import cities

might give you all cities in the world with a population > x,
complete with lat/long, populations, other data.  Sure, one
could store all this in a database.  But it's not either/or.
Import the entire CIA World Fact Book as a Python module
why not, with each nation state an object already pre-loaded.
http://www.servinghistory.com/topics/The_CIA_World_Factbook
I point to this data source as it's public domain, so no worries
about converting to Python.

Say I have a million lines of data in a .py module, and I import
it, that builds a .pyc right?  I'm not necessarily sucking that
whole data structure into memory.  The dictionary
data structure should be able to handle it.  I could have
12 million social security numbers keying to names and
addresses, import that as a module, and not suddenly have
all that in memory correct?  Then I could retrieve records
by key just like from a database (indeed, a hash table is
a kind of database).  I guess I'd need it in memory when
I started searching it, but given how much memory computers
have these days, that wouldn't necessarily be a problem.

This is getting back to the "rich data structures" idea.

Maybe we should do baseball modules.  Seems like one
of the main features of baseball is massive amounts of
data and statistics.  What would that look like in Python?
How about teams and players as objects, so you could
import a team object and go team.players() to get back
a list of player objects.  Each one of those would have
season stats etc...

I'm writing this without myself being any kind of huge
baseball fan, just know some who are who might enjoy
programming and computers a lot more if we'd only pump
out more modules full of real data they really care about.

Apropos:
http://mail.python.org/pipermail/python-list/2006-July/1031208.html

Static data that doesn't need updating a lot might be easiest.
I was recently suggesting the Periodic Table would make a
good module.  Constellations and their contained stars, some
info about each star...  of course such modules would take
work.  I'm thinking governments are especially positioned to
serve the schools by this means.  Moving to Python in
technology and humanities subjects (not mutually exclusive)
would mean using modules such as these to better communicate
with a learning, studious public.  Our new kind of math course
would have an easier time getting off the ground if such
resources were to become more available IMO.

Kirby