[scikit-learn] Need Urgent help please in resolving JobLibMemoryError

Debabrata Ghosh mailfordebu at gmail.com
Fri Dec 9 06:03:30 EST 2016


Thanks Piotr for your feedback !

I did look into the sparkit-learn yesterday but couldn't locate the fact
that it contained RandomForestClassifier method in it. I would need to
request customer for downloading this for me as I don't have permission for
that. May I please get your possible help whether sparkit-learn will have
the following methods (corresponding to skikit learn):

1.sklearn.ensemble -> RandomForestClassifier

2.sklearn.cross_validation -> StratifiedKFold

3.sklearn.cross_validation -> train_test_split

Do we have a URl for sparkit-learn similar to skikit learn where all the
methods are listed

I have figured out that sparkit-learn needs to be downloaded from
https://pypi.python.org/pypi/sparkit-learn but apart from it does anything
else need to be downloaded.

Just wanted to check once before requesting my customer as otherwise it
would be a bit embarrassing.

Thanks again !

Cheers,

Debu

On Fri, Dec 9, 2016 at 3:37 PM, Piotr Bialecki <piotr.bialecki at hotmail.de>
wrote:

> Hi Debu,
>
> I have not worked with pyspark yet and cannot resolve your error,
> but have you tried out sparkit-learn?
> https://github.com/lensacom/sparkit-learn
>
> It seems to be a package combining pyspark with sklearn and it also has a
> RandomForest and other classifiers:
> (SparkRandomForestClassifier, https://github.com/lensacom/
> sparkit-learn/blob/master/splearn/ensemble/__init__.py)
>
>
> Greets,
> Piotr
>
> On 09.12.2016 10:56, Debabrata Ghosh wrote:
>
> Hi Piotr,
>                      Yes, I did use n_jobs = - 1 as well. But the code
> didn't run successfully. On my output screen , I got the following message
> instead of the JobLibMemoryError:
>
> 16/12/08 22:12:26 INFO YarnExtensionServices: In shutdown hook for
> org.apache.spark.scheduler.cluster.YarnExtensionServices$$anon$1 at 176b071d
> 16/12/08 22:12:26 INFO YarnHistoryService: Shutting down: pushing out 0
> events
> 16/12/08 22:12:26 INFO YarnHistoryService: Event handler thread stopping
> the service
> 16/12/08 22:12:26 INFO YarnHistoryService: Stopping dequeue service, final
> queue size is 0
> 16/12/08 22:12:26 INFO YarnHistoryService: Stopped: Service History
> Service in state History Service: STOPPED endpoint=
> <https://w3-01.ibm.com/tools/forms/ica/icaroute.nsf/bysrcall/ica201612786?OpenDocument>
> http://servername.com:8188/ws/v1/timeline/
> <http://toplxhdmp001.rails.rwy.bnsf.com:8188/ws/v1/timeline/>; bonded to
> ATS=false; listening=true; batchSize=3; flush count=17; current queue
> size=0; total number queued=52, processed=50; post failures=0;
> 16/12/08 22:12:26 INFO SparkContext: Invoking stop() from shutdown hook
> 16/12/08 22:12:26 INFO YarnHistoryService: History service stopped;
> ignoring queued event : [1481256746854]: SparkListenerApplicationEnd(14
> 81256746854)
>
>                      Just to get you a background I am executing the
> scikit-learn Random Classifier using pyspark command. I am not getting what
> has gone wrong while using n_jobs = -1 and suddenly the program is shutting
> down certain services. Please can you suggest a remedy as I have been given
> the task to run this via pyspark itself.
>
>                       Thanks in advance !
>
> Cheers,
>
> Debu
>
> On Fri, Dec 9, 2016 at 2:48 PM, Piotr Bialecki <piotr.bialecki at hotmail.de>
> wrote:
>
>> Hi Debu,
>>
>> it seems that you run out of memory.
>> Try using fewer processes.
>> I don't think that n_jobs = 1000 will perform as you wish.
>>
>> Setting n_jobs to -1 uses the number of cores in your system.
>>
>>
>> Greets,
>> Piotr
>>
>>
>> On 09.12.2016 08:16, Debabrata Ghosh wrote:
>>
>> Hi All,
>>
>>                       Greetings !
>>
>>
>>
>> I am getting JoblibMemoryError while executing a scikit-learn
>> RandomForestClassifier code. Here is my algorithm in short:
>>
>>
>>
>> from sklearn.ensemble import RandomForestClassifier
>>
>> from sklearn.cross_validation import train_test_split
>>
>> import pandas as pd
>>
>> import numpy as np
>>
>> clf = RandomForestClassifier(n_estimators=5000, n_jobs=1000)
>>
>> clf.fit(p_input_features_train,p_input_labels_train)
>>
>>
>> The dataframe p_input_features contain 134 columns (features) and 5
>> million rows (observations). The exact *error message* is given below:
>>
>>
>> Executing Random Forest Classifier
>> Traceback (most recent call last):
>>   File "/home/user/rf_fold.py", line 43, in <module>
>>     clf.fit(p_features_train,p_labels_train)
>>   File "/var/opt/ lib/python2.7/site-packages/sklearn/ensemble/forest.py",
>> line 290, in fit
>>     for i, t in enumerate(trees))
>>   File "/var/opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
>> line 810, in __call__
>>     self.retrieve()
>>   File "/var/opt/lib /python2.7/site-packages/sklea
>> rn/externals/joblib/parallel.py", line 757, in retrieve
>>     raise exception
>> sklearn.externals.joblib.my_exceptions.JoblibMemoryError:
>> JoblibMemoryError
>> ____________________________________________________________
>> _______________
>> Multiprocessing exception:
>> ............................................................
>> ...............
>>
>> /var/opt/lib/python2.7/site-packages/sklearn/ensemble/forest.py in
>> fit(self=RandomForestClassifier(bootstrap=True, class_wei...te=None,
>> verbose=0,
>>             warm_start=False), X=array([[ 0.        ,  0.        ,
>> 0.        , ....        0.        ,  0.        ]], dtype=float32),
>> y=array([[ 0.],
>>        [ 0.],
>>        [ 0.],
>>        ...,
>>        [ 0.],
>>        [ 0.],
>>        [ 0.]]), sample_weight=None)
>>     285             trees = Parallel(n_jobs=self.n_jobs,
>> verbose=self.verbose,
>>     286                              backend="threading")(
>>     287                 delayed(_parallel_build_trees)(
>>     288                     t, self, X, y, sample_weight, i, len(trees),
>>     289                     verbose=self.verbose,
>> class_weight=self.class_weight)
>> --> 290                 for i, t in enumerate(trees))
>>         i = 4999
>>     291
>>     292             # Collect newly grown trees
>>     293             self.estimators_.extend(trees)
>>     294
>>
>> ............................................................
>> ...............
>>
>>
>>
>> Please can you help me to identify a possible resolution to this.
>>
>>
>> Thanks,
>>
>> Debu
>>
>>
>> _______________________________________________
>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________ scikit-learn mailing
>> list scikit-learn at python.org https://mail.python.org/mailma
>> n/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161209/90e82a94/attachment.html>


More information about the scikit-learn mailing list