[scikit-learn] imbalanced classes: class_weight

S Hamidizade hamidizade.s at gmail.com
Tue Jun 19 10:52:28 EDT 2018


Hi

I would appreciate if you could let me know what is the best way to
categorize the approaches which have been developed to deal with imbalance
class problem?

*This article
<https://www.sciencedirect.com/science/article/pii/S0020025513005124>
categorizes them into:*

   1. Preprocessing: includes oversampling, undersampling and hybrid
   methods,
   2. Cost-sensitive learning: includes direct methods and meta-learning
   which the latter further divides into thresholding and sampling,
   3. Ensemble techniques: includes cost-sensitive ensembles and data
   preprocessing in conjunction with ensemble learning.

*The second <https://dl.acm.org/citation.cfm?id=2907070> classification:*

   1. Data Pre-processing: includes distribution change and weighting the
   data space. One-class learning is considered as distribution change.
   2. Special-purpose Learning Methods
   3. Prediction Post-processing: includes threshold method and
   cost-sensitive post-processing
   4. Hybrid Methods:

*The third article
<https://link.springer.com/article/10.1007/s13748-016-0094-0>:*

   1. Data-level methods
   2. Algorithm-level methods
   3. Hybrid methods

The last classification also considers output adjustment as an independent
approach.

Could you please let me know the class-weight in the sklearn's classifiers
e.g., logistic regression is classified into which category? Is it true to
say:

In case of the first categorization, it falls into cost-sensitive learning

In case of the second taxonomy, it would be classified into the third
category i.e., cost-sensitive post-processing

In case of the third classification, it should fall into algorithm level

Best regards,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180619/56195a7d/attachment.html>


More information about the scikit-learn mailing list