[Numpy-discussion] Using numpy on hadoop streaming: ImportError: cannot import name multiarray

Wed Feb 11 16:59:35 EST 2015

Hi David,
Thanks for your response.

But I can't install anything on cluster.
*Could anyone please help me understand how the file 'multiarray.so' is
used by the tagger. I mean how it is loaded( I assume its some sort of DLL
for windows and shared library for unix based systems). Is it a module or
what?*

Right now what I did is I packaged numpy so that numpy will be present at
the current working directory for mapper and reducer. So now control goes
into numpy packaged alongwith mapper.
But still right now I see such error:

*File "glossextractionengine.mod/nltk/tag/__init__.py", line 123, in pos_tag
  File "glossextractionengine.mod/pickle.py", line 1380, in load
    return doctest.testmod()
  File "glossextractionengine.mod/pickle.py", line 860, in load
    return stopinst.value
  File "glossextractionengine.mod/pickle.py", line 1092, in load_global
    dispatch[GLOBAL] = load_global
  File "glossextractionengine.mod/pickle.py", line 1126, in find_class
    klass = getattr(mod, name)
  File "numpy.mod/numpy/__init__.py", line 137, in <module>
  File "numpy.mod/numpy/add_newdocs.py", line 13, in <module>
  File "numpy.mod/numpy/lib/__init__.py", line 4, in <module>
  File "numpy.mod/numpy/lib/type_check.py", line 21, in <module>
  File "numpy.mod/numpy/core/__init__.py", line 9, in <module>
ImportError: No module named multiarray*

In this case the file 'multiarray.so' is present in within core package
only, but it is still not found.
Can anyone throw some light on it.

Thanks!
Kartik

On Wed, Feb 11, 2015 at 7:17 AM, Daπid <davidmenhur at gmail.com> wrote:

> On 11 February 2015 at 08:06, Kartik Kumar Perisetla
> <kartik.peri at gmail.com> wrote:
> > Thanks David. But do I need to install virtualenv on every node in hadoop
> > cluster? Actually I am not very sure whether same namenodes are assigned
> for
> > my every hadoop job. So how shall I proceed on such scenario.
>
> I have never used hadoop, but in the clusters I have used, you have a
> home folder on the central node, and each and every computing node has
> access to it. You can then install Python in your home folder and make
> every node run that, or pull a local copy.
>
> Probably the cluster support can clear this up further and adapt it to
> your particular case.
>
> /David.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

-- 
Regards,

Kartik Perisetla
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150211/ccc9bf6f/attachment.html>