<div dir="ltr"><p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:15px;clear:both;font-family:'Helvetica Neue',Helvetica,Arial,sans-serif;line-height:17.7272720336914px">Hi all,</p><p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:15px;clear:both;font-family:'Helvetica Neue',Helvetica,Arial,sans-serif;line-height:17.7272720336914px"><span style="line-height:17.7272720336914px">for one of my projects I am using basically using NLTK for pos tagging, which internally uses a 'english.pickle' file. I managed to package the nltk library with these pickle files to make them available to mapper and reducer for hadoop streaming job using -file option.</span><br></p><p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:15px;clear:both;font-family:'Helvetica Neue',Helvetica,Arial,sans-serif;line-height:17.7272720336914px">However, when nltk library is trying to load that pickle file, it gives error for numpy- since the cluster I am running this job does not have numpy installed. Also, I don't have root access thus, can't install numpy or any other package on cluster. So the only way is to package the python modules to make it available for mapper and reducer. I successfully managed to do that. But now the problem is when numpy is imported, it imports multiarray by default( as seen in <strong style="margin:0px;padding:0px;border:0px">init</strong>.py) and this is where I am getting the error:</p><pre style="margin-top:0px;padding:5px;border:0px;font-size:13px;overflow:auto;width:auto;max-height:600px;font-family:Consolas,Menlo,Monaco,'Lucida Console','Liberation Mono','DejaVu Sans Mono','Bitstream Vera Sans Mono','Courier New',monospace,sans-serif;word-wrap:normal;background-color:rgb(238,238,238)"><code style="margin:0px;padding:0px;border:0px;font-family:Consolas,Menlo,Monaco,'Lucida Console','Liberation Mono','DejaVu Sans Mono','Bitstream Vera Sans Mono','Courier New',monospace,sans-serif;white-space:inherit">File "/usr/lib64/python2.6/pickle.py", line 1370, in load
return Unpickler(file).load()
File "/usr/lib64/python2.6/pickle.py", line 858, in load
dispatch[key](self)
File "/usr/lib64/python2.6/pickle.py", line 1090, in load_global
klass = self.find_class(module, name)
File "/usr/lib64/python2.6/pickle.py", line 1124, in find_class
__import__(module)
File "numpy.mod/numpy/__init__.py", line 170, in <module>
File "numpy.mod/numpy/add_newdocs.py", line 13, in <module>
File "numpy.mod/numpy/lib/__init__.py", line 8, in <module>
File "numpy.mod/numpy/lib/type_check.py", line 11, in <module>
File "numpy.mod/numpy/core/__init__.py", line 6, in <module>
ImportError: cannot import name multiarray
</code></pre><p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:15px;clear:both;font-family:'Helvetica Neue',Helvetica,Arial,sans-serif;line-height:17.7272720336914px">I tried moving numpy directory on my local machine that contains multiarray.pyd, to the cluster to make it available to mapper and reducer but this didn't help.</p><p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:15px;clear:both;font-family:'Helvetica Neue',Helvetica,Arial,sans-serif;line-height:17.7272720336914px">Any input on how to resolve this(keeping the constraint that I cannot install anything on cluster machines)? </p><p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:15px;clear:both;font-family:'Helvetica Neue',Helvetica,Arial,sans-serif;line-height:17.7272720336914px">Thanks!</p><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div><span style="color:rgb(136,136,136)">Regards,</span><br style="color:rgb(136,136,136)"><br style="color:rgb(136,136,136)"><font color="#888888">Kartik Perisetla</font></div></div></div>
</div>