<div>What is mrjob?</div><div>-----------------------</div><div>mrjob is a Python package that helps you write and run Hadoop Streaming jobs.</div><div><br></div><div>mrjob fully supports Amazon's Elastic MapReduce (EMR) service, which allows you to buy time on a Hadoop cluster on an hourly basis. It also works with your own Hadoop cluster.</div>
<div><br></div><div>Some important features:</div><div><br></div><div> * Run jobs on EMR, your own Hadoop cluster, or locally (for testing).</div><div> * Write multi-step jobs (one map-reduce step feeds into the next)</div>
<div> * Duplicate your production environment inside Hadoop</div><div> * Upload your source tree and put it in your job's $PYTHONPATH</div><div> * Run make and other setup scripts</div><div> * Set environment variables (e.g. $TZ)</div>
<div> * Easily install python packages from tarballs (EMR only)</div><div> * Setup handled transparently by mrjob.conf config file</div><div> * Automatically interpret error logs from EMR</div><div> * SSH tunnel to hadoop job tracker on EMR</div>
<div> * Minimal setup</div><div> * To run on EMR, set $AWS_ACCESS_KEY_ID and $AWS_SECRET_ACCESS_KEY</div><div> * To run on your Hadoop cluster, install simplejson and make sure $HADOOP_HOME is set.</div><div><br></div>
<div>More info:</div><div><br></div><div> * Install mrjob: pip install mrjob -OR- easy_install mrjob</div><div> * Documentation: <a href="http://packages.python.org/mrjob/">http://packages.python.org/mrjob/</a></div><div>
* PyPI: <a href="http://pypi.python.org/pypi/mrjob">http://pypi.python.org/pypi/mrjob</a></div><div> * Mailing list: <a href="http://groups.google.com/group/mrjob">http://groups.google.com/group/mrjob</a></div><div> * Development is hosted at github: <a href="http://github.com/Yelp/mrjob">http://github.com/Yelp/mrjob</a></div>
<div><br></div><div>What's new?</div><div>--------------------</div><div>mrjob v0.3.0 is a major new release. Full details are at <a href="http://packages.python.org/mrjob/whats-new.html">http://packages.python.org/mrjob/whats-new.html</a> - here are a few highlights:</div>
<div><br></div><div>v0.3.0, 2011-12-07</div><div> * Combiners</div><div> * *_init() and *_final() for mappers, reducers, and combiners</div><div> * Custom option parsers</div><div> * Job flow pooling on EMR (saves time and money!)</div>
<div> * SSH log fetching</div><div> * New EMR diagnostic tools</div><div><br></div><div>A big thanks to the contributors to this release: Steve Johnson, Dave Marin, Wahbeh Qardaji, Derek Wilson, Jordan Andersen, and Benjamin Goldenberg!</div>