[scikit-learn] Are sample weights normalized?
Michael Eickenberg
michael.eickenberg at gmail.com
Fri Jul 28 16:29:24 EDT 2017
Well, that will depend on how your estimator works. But in general you are
right - if you assume that samples 4 to N are weighted with the same weight
(e.g. 1) in both cases, then the sample 3 will be relatively less important
in the larger training set.
On Fri, Jul 28, 2017 at 1:06 PM, Abhishek Raj via scikit-learn <
scikit-learn at python.org> wrote:
> Hi Michael, thanks for the response. Based on what you said, is it correct
> to assume that weights are relative to the size of the data set? Eg
>
> If my dataset size is 200 and I have 1 of sample 1, 10 of sample 2 and 100
> of sample 3, sample 3 will be given a lot of focus during training because
> it exists in majority, but if my dataset size was say 1 million, these
> weights wouldn't really affect much?
>
> Thanks,
> Abhishek
>
> On Jul 28, 2017 10:41 PM, "Michael Eickenberg" <
> michael.eickenberg at gmail.com> wrote:
>
>> Hi Abhishek,
>>
>> think of your example as being equivalent to putting 1 of sample 1, 10 of
>> sample 2 and 100 of sample 3 in a dataset and then run your SVM.
>> This is exactly true for some estimators and approximately true for
>> others, but always a good intuition.
>>
>> Hope this helps!
>> Michael
>>
>>
>> On Fri, Jul 28, 2017 at 10:01 AM, Abhishek Raj via scikit-learn <
>> scikit-learn at python.org> wrote:
>>
>>> Hi,
>>>
>>> I am using one class svm for binary classification and was just curious
>>> what is the range/scale for sample weights? Are they normalized internally?
>>> For example -
>>>
>>> Sample 1, weight - 1
>>> Sample 2, weight - 10
>>> Sample 3, weight - 100
>>>
>>> Does this mean Sample 3 will always be predicted as positive and sample
>>> 1 will never be predicted as positive? What about sample 2?
>>>
>>> Also, what would happen if I assign a high weight to majority of the
>>> samples and low weights to the rest. Eg if 80% of my samples were weighted
>>> 1000 and 20% were weighted 1.
>>>
>>> A clarification or a link to read up on how exactly weights affect the
>>> training process would be really helpful.
>>>
>>> Thanks,
>>> Abhishek
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170728/28b1ccef/attachment-0001.html>
More information about the scikit-learn
mailing list