mrjob v0.3.0 released
Jimmy Retzlaff
jimmy at retzlaff.com
Thu Dec 8 04:57:16 EST 2011
What is mrjob?
-----------------------
mrjob is a Python package that helps you write and run Hadoop Streaming
jobs.
mrjob fully supports Amazon's Elastic MapReduce (EMR) service, which allows
you to buy time on a Hadoop cluster on an hourly basis. It also works with
your own Hadoop cluster.
Some important features:
* Run jobs on EMR, your own Hadoop cluster, or locally (for testing).
* Write multi-step jobs (one map-reduce step feeds into the next)
* Duplicate your production environment inside Hadoop
* Upload your source tree and put it in your job's $PYTHONPATH
* Run make and other setup scripts
* Set environment variables (e.g. $TZ)
* Easily install python packages from tarballs (EMR only)
* Setup handled transparently by mrjob.conf config file
* Automatically interpret error logs from EMR
* SSH tunnel to hadoop job tracker on EMR
* Minimal setup
* To run on EMR, set $AWS_ACCESS_KEY_ID and $AWS_SECRET_ACCESS_KEY
* To run on your Hadoop cluster, install simplejson and make sure
$HADOOP_HOME is set.
More info:
* Install mrjob: pip install mrjob -OR- easy_install mrjob
* Documentation: http://packages.python.org/mrjob/
* PyPI: http://pypi.python.org/pypi/mrjob
* Mailing list: http://groups.google.com/group/mrjob
* Development is hosted at github: http://github.com/Yelp/mrjob
What's new?
--------------------
mrjob v0.3.0 is a major new release. Full details are at
http://packages.python.org/mrjob/whats-new.html - here are a few highlights:
v0.3.0, 2011-12-07
* Combiners
* *_init() and *_final() for mappers, reducers, and combiners
* Custom option parsers
* Job flow pooling on EMR (saves time and money!)
* SSH log fetching
* New EMR diagnostic tools
A big thanks to the contributors to this release: Steve Johnson, Dave
Marin, Wahbeh Qardaji, Derek Wilson, Jordan Andersen, and Benjamin
Goldenberg!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20111208/df1b59a2/attachment-0001.html>
More information about the Python-list
mailing list