mrjob v0.2.4 released

Jimmy Retzlaff jimmy at
Wed Mar 9 22:24:57 CET 2011

What is mrjob?

mrjob is a Python package that helps you write and run Hadoop Streaming

mrjob fully supports Amazon's Elastic MapReduce (EMR) service, which allows
you to buy time on a Hadoop cluster on an hourly basis. It also works with
your own Hadoop cluster.

Some important features:

 * Run jobs on EMR, your own Hadoop cluster, or locally (for testing).
 * Write multi-step jobs (one map-reduce step feeds into the next)
 * Duplicate your production environment inside Hadoop
   * Upload your source tree and put it in your job's $PYTHONPATH
   * Run make and other setup scripts
   * Set environment variables (e.g. $TZ)
   * Easily install python packages from tarballs (EMR only)
   * Setup handled transparently by mrjob.conf config file
 * Automatically interpret error logs from EMR
 * SSH tunnel to hadoop job tracker on EMR
 * Minimal setup
   * To run on your Hadoop cluster, install simplejson and make sure
$HADOOP_HOME is set.

More info:

 * Install mrjob: python install
 * Documentation:
 * PyPI:
 * Development is hosted at github:

What's new?

v0.2.4, 2011-03-09 -- fix bootstrapping mrjob
 * Fix bootstrapping of mrjob in hadoop and local mode (Issue #89)
 * SSH tunnels try to use the same port for the same job flow (Issue #67)
 * Added mr_postfix_bounce and mr_pegasos_svm to examples.
 * Retry on spurious 505s from EMR API

More information about the Python-announce-list mailing list