[Baypiggies] my slides
Shannon -jj Behrens
jjinux at gmail.com
Fri Mar 27 22:49:03 CET 2009
Here are my elaborate "slides" from yesterday:
This is a random collection of topics related to Python tools.
Talk about the UNIX philosophy:
Small tools.
My problems tend to be too large for RAM, but not too big for one
machine.
UNIX and batch processing are a natural fit.
Multiple processes = multiple CPUs.
Multiple programming languages = more flexibility.
Pipes = concurrency without the pain.
Scales linearly and predictably, unlike databases.
UNIX tools that already exist are helpful and fast.
Use the optparse module to provide consistent command line APIs:
Here's an example of the setup from the docs:
: from optparse import OptionParser
: parser = OptionParser()
: parser.add_option("-f", "--file", dest="filename",
: help="write report to FILE", metavar="FILE")
: parser.add_option("-q", "--quiet",
: action="store_false", dest="verbose",
default=True,
: help="don't print status messages to stdout")
: (options, args) = parser.parse_args()
Here's an example of my own help text
: Usage: cleancuttsv.py [options]
:
: Options:
: -h, --help show this help message and exit
: --assert-head=FIELD1\tFIELD2\t...
: assert that the first line of the file
matches this
: --delete-head delete the first line of input
: -n NUM, --num-fields=NUM
: assert that there are this many fields per
line
: --drop-blank-lines delete blank lines instead of raising an
error
:
sort:
http://jjinux.blogspot.com/2008/08/python-sort-uniq-c-via-subprocess.html
sort -S 20% -T /mnt/some_other_drive ...
http://jjinux.blogspot.com/2008/08/python-memory-conservation-tip-sort.html
tsv:
You need a consistent format.
Downsides:
Most UNIX tools don't understand true TSV, but only an approximation
thereof:
My own code raises an exception in cases where it would actually
matter.
Many UNIX tools are ignorant of encoding issues:
Sometimes playing dumb works and sometimes it hurts.
Using the csv module:
: import csv
:
: DEFAULT_KARGS = dict(dialect='excel-tab', lineterminator='\n')
: MYSQL_LOAD_DATA_INFILE_DESC = """\
: FIELDS TERMINATED BY '\t'
: OPTIONALLY ENCLOSED BY '"'
: ESCAPED BY ''
: LINES TERMINATED BY '\n'"""
:
: def create_default_reader(iterable):
: """Return a csv.reader with our default options."""
: return csv.reader(iterable, **DEFAULT_KARGS)
: ...
Using mysqlimport.
: mysqlimport \
: --user=$MYSQL_USERNAME \
: --password=$MYSQL_PASSWORD \
: --columns=id,name \
: --fields-optionally-enclosed-by='"' \
: --fields-terminated-by='\t' \
: --fields-escaped-by='' \
: --lines-terminated-by='\n' \
: --local \
: --lock-tables \
: --replace \
: --verbose \
: $DATABASE ${BUILD}/sometable.tsv
To see warnings:
http://jjinux.blogspot.com/2009/03/mysql-encoding-hell.html
Show pdb in the context of a web app:
: import pdb
: from pprint import pprint
: pdb.set_trace()
: pprint(request.environ)
http://localhost:5000/api/ratio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20090327/1cac708a/attachment.htm>
More information about the Baypiggies
mailing list