[scikit-learn] imbalanced classes: class_weight
S Hamidizade
hamidizade.s at gmail.com
Tue Jun 19 10:52:28 EDT 2018
Hi
I would appreciate if you could let me know what is the best way to
categorize the approaches which have been developed to deal with imbalance
class problem?
*This article
<https://www.sciencedirect.com/science/article/pii/S0020025513005124>
categorizes them into:*
1. Preprocessing: includes oversampling, undersampling and hybrid
methods,
2. Cost-sensitive learning: includes direct methods and meta-learning
which the latter further divides into thresholding and sampling,
3. Ensemble techniques: includes cost-sensitive ensembles and data
preprocessing in conjunction with ensemble learning.
*The second <https://dl.acm.org/citation.cfm?id=2907070> classification:*
1. Data Pre-processing: includes distribution change and weighting the
data space. One-class learning is considered as distribution change.
2. Special-purpose Learning Methods
3. Prediction Post-processing: includes threshold method and
cost-sensitive post-processing
4. Hybrid Methods:
*The third article
<https://link.springer.com/article/10.1007/s13748-016-0094-0>:*
1. Data-level methods
2. Algorithm-level methods
3. Hybrid methods
The last classification also considers output adjustment as an independent
approach.
Could you please let me know the class-weight in the sklearn's classifiers
e.g., logistic regression is classified into which category? Is it true to
say:
In case of the first categorization, it falls into cost-sensitive learning
In case of the second taxonomy, it would be classified into the third
category i.e., cost-sensitive post-processing
In case of the third classification, it should fall into algorithm level
Best regards,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180619/56195a7d/attachment.html>
More information about the scikit-learn
mailing list