MLPClassifier/Regressor and Kernel Processes when Multiprocessing
Hi SciKit-Learn folks, I am building a stacked generalization classifier using the multilayer perceptron classifier as one of it's submodels. All data have been preprocessed appropriately and I am tuning each submodel's hyperparameters with a customized randomized search protocol (very similar to sklearn's RandomizedSearchCV). Importantly, I am using Python's Multiprocessing.Pool() to parallelize this search. When I start the hyperparameter search, jobs/threads do indeed spawn appropriately. Tuning other submodels (RandomForestClassifier, SVC, GradientBoostingClassifier, SDGClassifier) works perfectly, which each job (model with particular randomized parameters) being scored with cross_val_score and returning when the Pool of workers is complete. All is well until I reach the MLPClassifier model. Jobs spawn as with the other models, however, System CPU (Linux Kernel) processes surge and overwhelm my server. Approximately 20% of the CPUs are running User processes, while the other 80% of CPUS are running System/Kernel processes, causing immense slow-down. Again, this only happens with the MLPClassifier - all other models run appropriately with ~98% User processes and ~2% System/Kernel processes. Is there something unique in the MLPClassifier/Regressor models that causes increased System/Kernel processes compared to other models? In an attempt to troubleshoot, I used sklearn's RandomizedSearchCV instead of my custom implementation and the same problems happen (with n_jobs specified in the same way). Any help with why the MLP models are behaving this way during multiprocessing is much appreciated. Best, Taylor Keding
Hi, I cannot look too much in details. However, I would advice you to try using loky or joblib instead of multiprocessing, as a lot of work has been put in them to protect against problems that can arise in multi-process parallel computing (for instance the underlying numerical libraries may not be fork safe, or they may have parallel computing abilities themselves). Hope this helps, Gaël On Tue, Apr 28, 2020 at 02:06:00PM -0500, Taylor J Keding wrote:
Hi SciKit-Learn folks,
I am building a stacked generalization classifier using the multilayer perceptron classifier as one of it's submodels. All data have been preprocessed appropriately and I am tuning each submodel's hyperparameters with a customized randomized search protocol (very similar to sklearn's RandomizedSearchCV). Importantly, I am using Python's Multiprocessing.Pool() to parallelize this search.
When I start the hyperparameter search, jobs/threads do indeed spawn appropriately. Tuning other submodels (RandomForestClassifier, SVC, GradientBoostingClassifier, SDGClassifier) works perfectly, which each job (model with particular randomized parameters) being scored with cross_val_score and returning when the Pool of workers is complete. All is well until I reach the MLPClassifier model. Jobs spawn as with the other models, however, System CPU (Linux Kernel) processes surge and overwhelm my server. Approximately 20% of the CPUs are running User processes, while the other 80% of CPUS are running System/Kernel processes, causing immense slow-down. Again, this only happens with the MLPClassifier - all other models run appropriately with ~98% User processes and ~2% System/Kernel processes.
Is there something unique in the MLPClassifier/Regressor models that causes increased System/Kernel processes compared to other models? In an attempt to troubleshoot, I used sklearn's RandomizedSearchCV instead of my custom implementation and the same problems happen (with n_jobs specified in the same way).
Any help with why the MLP models are behaving this way during multiprocessing is much appreciated. Best, Taylor Keding
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Gael Varoquaux Research Director, INRIA Visiting professor, McGill http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
participants (2)
-
Gael Varoquaux -
Taylor J Keding