[Baypiggies] Fw: pydoop -- Python MapReduce and HDFS API for Hadoop

Joel VanderKwaak joelvanderkwaak at yahoo.com
Fri Nov 6 20:59:21 CET 2009


fyi - I haven't used it, and can't comment ;)



----- Forwarded Message ----
From: Simone Leo <simone.leo at crs4.it>
To: general at hadoop.apache.org
Sent: Fri, November 6, 2009 9:20:36 AM
Subject: pydoop -- Python MapReduce and HDFS API for Hadoop

Hello everybody,

we recently released pydoop, a Python MapReduce and HDFS API for Hadoop:

http://pydoop.sourceforge.net

It is implemented as a Boost.Python wrapper around the C++ code (pipes
and libhdfs). It allows you to write complete MapReduce application in
CPython, with the same capabilities as the C++ API. Here is a minimal
wordcount example:


from pydoop.pipes import Mapper, Reducer, Factory, runTask

class WordCountMapper(Mapper):

  def __init__(self, context):
    super(WordCountMapper, self).__init__(context)

  def map(self, context):
    words = context.getInputValue().split()
    for w in words:
      context.emit(w, "1")

class WordCountReducer(Reducer):

  def __init__(self, context):
    super(WordCountReducer, self).__init__(context)

  def reduce(self, context):
    s = 0
    while context.nextValue():
      s += int(context.getInputValue())
    context.emit(context.getInputKey(), str(s))

runTask(Factory(WordCountMapper, WordCountReducer))


Any feedback would be greatly appreciated.

-- 
Simone Leo
Distributed Computing group
Advanced Computing and Communications program
CRS4
POLARIS - Building #1
Piscina Manna
I-09010 Pula (CA) - Italy
e-mail: simleo at crs4.it
http://www.crs4.it
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20091106/3466b9aa/attachment.htm>


More information about the Baypiggies mailing list