Suggestions for Python MapReduce?

Casey Webster Caseyweb at gmail.com
Wed Jul 22 09:23:33 EDT 2009


On Jul 22, 5:27 am, Phillip B Oldham <phillip.old... at gmail.com> wrote:
> I understand that there are a number of MapReduce frameworks/tools
> that play nicely with Python (Disco, Dumbo/Hadoop), however these have
> "large" dependencies (Erlang/Java). Are there any MapReduce frameworks/
> tools which are either pure-Python, or play nicely with Python but
> don't require the Java/Erlang runtime environments?

I can't answer your question, but I would like to better understand
the
problem you are trying to solve.  The Apache Hadoop/MapReduce java
application isn't really that "large" by modern standards, although it
is generally run with large heap sizes for performance (-Xmx1024m or
larger for the mapred.child.java.opts parameter).

MapReduce is designed to do extremely fast parallel data set
processing
on terabytes of data over hundreds of physical nodes; what advantage
would a pure Python approach have here?



More information about the Python-list mailing list