[scikit-learn] sample_weight vs class_weight
solegalli at protonmail.com
Sat Dec 5 07:55:41 EST 2020
Thank you guys! very helpful :)
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, December 4, 2020 12:06 PM, mrschots <maykonschots at gmail.com> wrote:
> I have been using both in time-series classification. I put a exponential decay in sample_weights AND class weights as a dictionary.
> Em sex., 4 de dez. de 2020 às 12:01, Nicolas Hug <niourf at gmail.com> escreveu:
>> Basically passing class weights should be equivalent to passing per-class-constant sample weights.
>>> why do some estimators allow to pass weights both as a dict in the init or as sample weights in fit? what's the logic?
>> SW is a per-sample property (aligned with X and y) so we avoid passing those to init because the data isn't known when initializing the estimator. It's only known when calling fit. In general we avoid passing data-related info into init so that the same instance can be fitted on any data (with different number of samples, different classes, etc.).
>> We allow to pass class_weight in init because the 'balanced' option is data-agnostic. Arguably, allowing a dict with actual class values violates the above argument (of not having data-related stuff in init), so I guess that's where the logic ends ;)
>> As to why one would use both, I'm not so sure honestly.
>> On 12/4/20 10:40 AM, Sole Galli via scikit-learn wrote:
>>> Actually, I found the answer. Both seem to be optimising the loss function for the various algorithms, below I include some links.
>>> If, we pass class_weight and sample_weight, then the final cost / weight is a combination of both.
>>> I have a follow up question: in which scenario would we use both? why do some estimators allow to pass weights both as a dict in the init or as sample weights in fit? what's the logic? I found it a bit confusing at the beginning.
>>> Thank you!
>>> Soledad Galli
>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>> On Thursday, December 3, 2020 11:55 AM, Sole Galli via scikit-learn [<scikit-learn at python.org>](mailto:scikit-learn at python.org) wrote:
>>>> Hello team,
>>>> What is the difference in the implementation of class_weight and sample_weight in those algorithms that support both? like random forest or logistic regression?
>>>> Are both modifying the loss function? in a similar way?
>>>> Thank you!
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>> scikit-learn mailing list
>> scikit-learn at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the scikit-learn