Here are my elaborate "slides" from yesterday:<br><br>This is a random collection of topics related to Python tools.<br><br>Talk about the UNIX philosophy:<br> Small tools.<br> My problems tend to be too large for RAM, but not too big for one machine.<br>
UNIX and batch processing are a natural fit.<br> Multiple processes = multiple CPUs.<br> Multiple programming languages = more flexibility.<br> Pipes = concurrency without the pain.<br> Scales linearly and predictably, unlike databases.<br>
UNIX tools that already exist are helpful and fast.<br><br>Use the optparse module to provide consistent command line APIs:<br> Here's an example of the setup from the docs:<br> : from optparse import OptionParser<br>
: parser = OptionParser()<br> : parser.add_option("-f", "--file", dest="filename",<br> : help="write report to FILE", metavar="FILE")<br>
: parser.add_option("-q", "--quiet",<br> : action="store_false", dest="verbose", default=True,<br> : help="don't print status messages to stdout")<br>
: (options, args) = parser.parse_args()<br> Here's an example of my own help text<br> : Usage: cleancuttsv.py [options]<br> : <br> : Options:<br> : -h, --help show this help message and exit<br>
: --assert-head=FIELD1\tFIELD2\t...<br> : assert that the first line of the file matches this<br> : --delete-head delete the first line of input<br> : -n NUM, --num-fields=NUM<br>
: assert that there are this many fields per line<br> : --drop-blank-lines delete blank lines instead of raising an error<br> :<br><br>sort:<br> <a href="http://jjinux.blogspot.com/2008/08/python-sort-uniq-c-via-subprocess.html">http://jjinux.blogspot.com/2008/08/python-sort-uniq-c-via-subprocess.html</a><br>
sort -S 20% -T /mnt/some_other_drive ...<br> <a href="http://jjinux.blogspot.com/2008/08/python-memory-conservation-tip-sort.html">http://jjinux.blogspot.com/2008/08/python-memory-conservation-tip-sort.html</a><br>
<br>tsv:<br> You need a consistent format.<br> Downsides:<br> Most UNIX tools don't understand true TSV, but only an approximation thereof:<br> My own code raises an exception in cases where it would actually matter.<br>
Many UNIX tools are ignorant of encoding issues:<br> Sometimes playing dumb works and sometimes it hurts.<br> Using the csv module:<br> : import csv<br> : <br> : DEFAULT_KARGS = dict(dialect='excel-tab', lineterminator='\n')<br>
: MYSQL_LOAD_DATA_INFILE_DESC = """\<br> : FIELDS TERMINATED BY '\t'<br> : OPTIONALLY ENCLOSED BY '"'<br> : ESCAPED BY ''<br>
: LINES TERMINATED BY '\n'"""<br> : <br> : def create_default_reader(iterable):<br> : """Return a csv.reader with our default options."""<br>
: return csv.reader(iterable, **DEFAULT_KARGS)<br> : ...<br> Using mysqlimport.<br> : mysqlimport \<br> : --user=$MYSQL_USERNAME \<br> : --password=$MYSQL_PASSWORD \<br> : --columns=id,name \<br>
: --fields-optionally-enclosed-by='"' \<br> : --fields-terminated-by='\t' \<br> : --fields-escaped-by='' \<br> : --lines-terminated-by='\n' \<br>
: --local \<br> : --lock-tables \<br> : --replace \<br> : --verbose \<br> : $DATABASE ${BUILD}/sometable.tsv<br> To see warnings:<br> <a href="http://jjinux.blogspot.com/2009/03/mysql-encoding-hell.html">http://jjinux.blogspot.com/2009/03/mysql-encoding-hell.html</a><br>
<br>Show pdb in the context of a web app:<br> : import pdb<br> : from pprint import pprint<br> : pdb.set_trace()<br> : pprint(request.environ)<br> <a href="http://localhost:5000/api/ratio">http://localhost:5000/api/ratio</a><br>