Are sample weights normalized?
Hi, I am using one class svm for binary classification and was just curious what is the range/scale for sample weights? Are they normalized internally? For example - Sample 1, weight - 1 Sample 2, weight - 10 Sample 3, weight - 100 Does this mean Sample 3 will always be predicted as positive and sample 1 will never be predicted as positive? What about sample 2? Also, what would happen if I assign a high weight to majority of the samples and low weights to the rest. Eg if 80% of my samples were weighted 1000 and 20% were weighted 1. A clarification or a link to read up on how exactly weights affect the training process would be really helpful. Thanks, Abhishek
Hi Abhishek, think of your example as being equivalent to putting 1 of sample 1, 10 of sample 2 and 100 of sample 3 in a dataset and then run your SVM. This is exactly true for some estimators and approximately true for others, but always a good intuition. Hope this helps! Michael On Fri, Jul 28, 2017 at 10:01 AM, Abhishek Raj via scikit-learn < scikit-learn@python.org> wrote:
Hi,
I am using one class svm for binary classification and was just curious what is the range/scale for sample weights? Are they normalized internally? For example -
Sample 1, weight - 1 Sample 2, weight - 10 Sample 3, weight - 100
Does this mean Sample 3 will always be predicted as positive and sample 1 will never be predicted as positive? What about sample 2?
Also, what would happen if I assign a high weight to majority of the samples and low weights to the rest. Eg if 80% of my samples were weighted 1000 and 20% were weighted 1.
A clarification or a link to read up on how exactly weights affect the training process would be really helpful.
Thanks, Abhishek
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi Michael, thanks for the response. Based on what you said, is it correct to assume that weights are relative to the size of the data set? Eg If my dataset size is 200 and I have 1 of sample 1, 10 of sample 2 and 100 of sample 3, sample 3 will be given a lot of focus during training because it exists in majority, but if my dataset size was say 1 million, these weights wouldn't really affect much? Thanks, Abhishek On Jul 28, 2017 10:41 PM, "Michael Eickenberg" <michael.eickenberg@gmail.com> wrote:
Hi Abhishek,
think of your example as being equivalent to putting 1 of sample 1, 10 of sample 2 and 100 of sample 3 in a dataset and then run your SVM. This is exactly true for some estimators and approximately true for others, but always a good intuition.
Hope this helps! Michael
On Fri, Jul 28, 2017 at 10:01 AM, Abhishek Raj via scikit-learn < scikit-learn@python.org> wrote:
Hi,
I am using one class svm for binary classification and was just curious what is the range/scale for sample weights? Are they normalized internally? For example -
Sample 1, weight - 1 Sample 2, weight - 10 Sample 3, weight - 100
Does this mean Sample 3 will always be predicted as positive and sample 1 will never be predicted as positive? What about sample 2?
Also, what would happen if I assign a high weight to majority of the samples and low weights to the rest. Eg if 80% of my samples were weighted 1000 and 20% were weighted 1.
A clarification or a link to read up on how exactly weights affect the training process would be really helpful.
Thanks, Abhishek
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Well, that will depend on how your estimator works. But in general you are right - if you assume that samples 4 to N are weighted with the same weight (e.g. 1) in both cases, then the sample 3 will be relatively less important in the larger training set. On Fri, Jul 28, 2017 at 1:06 PM, Abhishek Raj via scikit-learn < scikit-learn@python.org> wrote:
Hi Michael, thanks for the response. Based on what you said, is it correct to assume that weights are relative to the size of the data set? Eg
If my dataset size is 200 and I have 1 of sample 1, 10 of sample 2 and 100 of sample 3, sample 3 will be given a lot of focus during training because it exists in majority, but if my dataset size was say 1 million, these weights wouldn't really affect much?
Thanks, Abhishek
On Jul 28, 2017 10:41 PM, "Michael Eickenberg" < michael.eickenberg@gmail.com> wrote:
Hi Abhishek,
think of your example as being equivalent to putting 1 of sample 1, 10 of sample 2 and 100 of sample 3 in a dataset and then run your SVM. This is exactly true for some estimators and approximately true for others, but always a good intuition.
Hope this helps! Michael
On Fri, Jul 28, 2017 at 10:01 AM, Abhishek Raj via scikit-learn < scikit-learn@python.org> wrote:
Hi,
I am using one class svm for binary classification and was just curious what is the range/scale for sample weights? Are they normalized internally? For example -
Sample 1, weight - 1 Sample 2, weight - 10 Sample 3, weight - 100
Does this mean Sample 3 will always be predicted as positive and sample 1 will never be predicted as positive? What about sample 2?
Also, what would happen if I assign a high weight to majority of the samples and low weights to the rest. Eg if 80% of my samples were weighted 1000 and 20% were weighted 1.
A clarification or a link to read up on how exactly weights affect the training process would be really helpful.
Thanks, Abhishek
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (2)
-
Abhishek Raj -
Michael Eickenberg