[Baypiggies] my slides

Shannon -jj Behrens jjinux at gmail.com
Fri Mar 27 22:49:03 CET 2009

Here are my elaborate "slides" from yesterday:

This is a random collection of topics related to Python tools.

Talk about the UNIX philosophy:
    Small tools.
    My problems tend to be too large for RAM, but not too big for one
    UNIX and batch processing are a natural fit.
    Multiple processes = multiple CPUs.
    Multiple programming languages = more flexibility.
    Pipes = concurrency without the pain.
    Scales linearly and predictably, unlike databases.
    UNIX tools that already exist are helpful and fast.

Use the optparse module to provide consistent command line APIs:
    Here's an example of the setup from the docs:
        : from optparse import OptionParser
        : parser = OptionParser()
        : parser.add_option("-f", "--file", dest="filename",
        :                   help="write report to FILE", metavar="FILE")
        : parser.add_option("-q", "--quiet",
        :                   action="store_false", dest="verbose",
        :                   help="don't print status messages to stdout")
        : (options, args) = parser.parse_args()
    Here's an example of my own help text
        : Usage: cleancuttsv.py [options]
        : Options:
        :   -h, --help            show this help message and exit
        :   --assert-head=FIELD1\tFIELD2\t...
        :                         assert that the first line of the file
matches this
        :   --delete-head         delete the first line of input
        :   -n NUM, --num-fields=NUM
        :                         assert that there are this many fields per
        :   --drop-blank-lines    delete blank lines instead of raising an


    sort -S 20% -T /mnt/some_other_drive ...


    You need a consistent format.
        Most UNIX tools don't understand true TSV, but only an approximation
            My own code raises an exception in cases where it would actually
        Many UNIX tools are ignorant of encoding issues:
            Sometimes playing dumb works and sometimes it hurts.
    Using the csv module:
        : import csv
        : DEFAULT_KARGS = dict(dialect='excel-tab', lineterminator='\n')
        :     FIELDS TERMINATED BY '\t'
        :            OPTIONALLY ENCLOSED BY '"'
        :            ESCAPED BY ''
        :     LINES TERMINATED BY '\n'"""
        : def create_default_reader(iterable):
        :     """Return a csv.reader with our default options."""
        :     return csv.reader(iterable, **DEFAULT_KARGS)
        : ...
    Using mysqlimport.
        : mysqlimport \
        :     --user=$MYSQL_USERNAME \
        :     --password=$MYSQL_PASSWORD \
        :     --columns=id,name \
        :     --fields-optionally-enclosed-by='"' \
        :     --fields-terminated-by='\t' \
        :     --fields-escaped-by='' \
        :     --lines-terminated-by='\n' \
        :     --local \
        :     --lock-tables \
        :     --replace \
        :     --verbose \
        :     $DATABASE ${BUILD}/sometable.tsv
        To see warnings:

Show pdb in the context of a web app:
    : import pdb
    : from pprint import pprint
    : pdb.set_trace()
    : pprint(request.environ)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20090327/1cac708a/attachment.htm>

More information about the Baypiggies mailing list