Suggestions for Python MapReduce?
Caseyweb at gmail.com
Wed Jul 22 09:23:33 EDT 2009
On Jul 22, 5:27 am, Phillip B Oldham <phillip.old... at gmail.com> wrote:
> I understand that there are a number of MapReduce frameworks/tools
> that play nicely with Python (Disco, Dumbo/Hadoop), however these have
> "large" dependencies (Erlang/Java). Are there any MapReduce frameworks/
> tools which are either pure-Python, or play nicely with Python but
> don't require the Java/Erlang runtime environments?
I can't answer your question, but I would like to better understand
problem you are trying to solve. The Apache Hadoop/MapReduce java
application isn't really that "large" by modern standards, although it
is generally run with large heap sizes for performance (-Xmx1024m or
larger for the mapred.child.java.opts parameter).
MapReduce is designed to do extremely fast parallel data set
on terabytes of data over hundreds of physical nodes; what advantage
would a pure Python approach have here?
More information about the Python-list