[scikit-learn] sample_weight vs class_weight

mrschots maykonschots at gmail.com
Fri Dec 4 06:06:54 EST 2020


I have been using both in time-series classification. I put a exponential
decay in sample_weights AND class weights as a dictionary.

BR/Schots

Em sex., 4 de dez. de 2020 às 12:01, Nicolas Hug <niourf at gmail.com>
escreveu:

> Basically passing class weights should be equivalent to passing
> per-class-constant sample weights.
>
> > why do some estimators allow to pass weights both as a dict in the init
> or as sample weights in fit? what's the logic?
>
> SW is a per-sample property (aligned with X and y) so we avoid passing
> those to init because the data isn't known when initializing the estimator.
> It's only known when calling fit. In general we avoid passing data-related
> info into init so that the same instance can be fitted on any data (with
> different number of samples, different classes, etc.).
>
> We allow to pass class_weight in init because the 'balanced' option is
> data-agnostic. Arguably, allowing a dict with actual class values violates
> the above argument (of not having data-related stuff in init), so I guess
> that's where the logic ends ;)
>
> As to why one would use both, I'm not so sure honestly.
>
>
> Nicolas
>
>
> On 12/4/20 10:40 AM, Sole Galli via scikit-learn wrote:
>
> Actually, I found the answer. Both seem to be optimising the loss function
> for the various algorithms, below I include some links.
>
> If, we pass *class_weight* and *sample_weight,* then the final cost /
> weight is a combination of both.
>
> I have a follow up question: in which scenario would we use both? why do
> some estimators allow to pass weights both as a dict in the init or as
> sample weights in fit? what's the logic? I found it a bit confusing at the
> beginning.
>
> Thank you!
>
>
> https://stackoverflow.com/questions/30805192/scikit-learn-random-forest-class-weight-and-sample-weight-parameters
>
>
> https://stackoverflow.com/questions/30972029/how-does-the-class-weight-parameter-in-scikit-learn-work/30982811#30982811
>
> Soledad Galli
> https://www.trainindata.com/
>
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, December 3, 2020 11:55 AM, Sole Galli via scikit-learn
> <scikit-learn at python.org> <scikit-learn at python.org> wrote:
>
> Hello team,
>
> What is the difference in the implementation of class_weight and
> sample_weight in those algorithms that support both? like random forest or
> logistic regression?
>
> Are both modifying the loss function? in a similar way?
>
> Thank you!
>
> Sole
>
>
>
>
> _______________________________________________
> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-- 
Schots
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20201204/95ae467d/attachment.html>


More information about the scikit-learn mailing list