Re: [scikit-learn] sklearn.model_selection.GridSearchCV - unable to use n_jobs>1 on MacOS Sierra python 2.7
There are two cases : n_jobs > 1 works when data is smaller - when the training docs numpy array is 15MB. It does not work when training matrix is 100MB. My Mac has 16GB RAM. In the second case, the jobs die out pretty quickly, in seconds, and the main python process seems to die out (min CPU usage). There is a popup message saying 'python processes appear to have died'. This is when i run python on bash command line. When I run in python GUI IDLE, a message pops up 'your program is still running, sure you want to close window'. What are these jobs anyway? Are they various parameter combinations in param_grid, or lower level jobs out of compiler etc? Does each job replicate the training data in RAM? regards On Sun, Jan 7, 2018 at 11:35 AM, Sumeet Sandhu <sumeet.k.sandhu@gmail.com> wrote:
Hi,
I was able to run this with n_jobs=-1, and the activity monitor does show all 8 CPUs engaged, but the jobs start to die out one by one. I tried with n_jobs=2, same story. The only option that works is n_jobs=1. I played around with 'pre_dispatch' a bit - unclear what that does.
GRID = GridSearchCV(LogisticRegression(), param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=10, verbose=0, error_score=0, return_train_score=False) GRID.fit(trainDocumentV,trainLabelV)
How can I sustain at least 3-4 parallel jobs?
thanks, Sumeet
and just now, the first case stopped working too - the 15MB training data causes python to abruptly die. On Mon, Jan 8, 2018 at 9:22 PM, Sumeet Sandhu <sumeet.k.sandhu@gmail.com> wrote:
There are two cases : n_jobs > 1 works when data is smaller - when the training docs numpy array is 15MB. It does not work when training matrix is 100MB. My Mac has 16GB RAM.
In the second case, the jobs die out pretty quickly, in seconds, and the main python process seems to die out (min CPU usage). There is a popup message saying 'python processes appear to have died'. This is when i run python on bash command line. When I run in python GUI IDLE, a message pops up 'your program is still running, sure you want to close window'.
What are these jobs anyway? Are they various parameter combinations in param_grid, or lower level jobs out of compiler etc? Does each job replicate the training data in RAM?
regards
On Sun, Jan 7, 2018 at 11:35 AM, Sumeet Sandhu <sumeet.k.sandhu@gmail.com> wrote:
Hi,
I was able to run this with n_jobs=-1, and the activity monitor does show all 8 CPUs engaged, but the jobs start to die out one by one. I tried with n_jobs=2, same story. The only option that works is n_jobs=1. I played around with 'pre_dispatch' a bit - unclear what that does.
GRID = GridSearchCV(LogisticRegression(), param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=10, verbose=0, error_score=0, return_train_score=False) GRID.fit(trainDocumentV,trainLabelV)
How can I sustain at least 3-4 parallel jobs?
thanks, Sumeet
participants (1)
-
Sumeet Sandhu