[scikit-learn] OneClassSvm | Different results on different runs

Thu Aug 3 09:17:38 EDT 2017

Yes, in fact, changing the random_state might have an influence on the
result. The docstring of the random_state parameter for the OneClassSVM
seems incorrect though...

Albert

On Thu, Aug 3, 2017 at 1:55 PM Nicolas Goix <goix.nicolas at gmail.com> wrote:

> @albertcthomas isn't there some randomness in SMO which could influence
> the result if the tolerance parameter is too large?
>
> On Aug 3, 2017 1:28 PM, "Albert Thomas" <albertthomas88 at gmail.com> wrote:
>
>> Hi Abhishek,
>>
>> Could you provide a small code snippet? I don't think the random_state
>> parameter should influence the result of the OneClassSVM as there is no
>> probability estimation for this estimator.
>>
>> Albert
>>
>> On Thu, Aug 3, 2017 at 12:41 PM Jaques Grobler <jaquesgrobler at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> The random_state parameter is used to generate a pseudo random number
>>> that is used when shuffling your data for probability estimation
>>>
>>> The seed of the pseudo random number generator to use when shuffling the
>>> data for probability estimation.
>>> A seed can be provided to control the shuffling for reproducible
>>> behavior.
>>>
>>> Also, from the SVM docs
>>> <http://scikit-learn.org/stable/modules/svm.html#svm-outlier-detection>
>>>
>>> The underlying LinearSVC
>>>> <http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC> implementation
>>>> uses a random number generator to select features when fitting the model.
>>>> It is thus not uncommon, to have slightly different results for the same
>>>> input data. If that happens, try with a smaller *tol *parameter.
>>>
>>>
>>> Hope that helps
>>>
>>> 2017-08-03 12:15 GMT+02:00 Abhishek Raj via scikit-learn <
>>> scikit-learn at python.org>:
>>>
>>>> Hi,
>>>>
>>>> I am using one class svm for developing an anomaly detection model. I
>>>> observed that different runs of training on the same data set outputs
>>>> different accuracy. One run takes the accuracy as high as 98% and another
>>>> run on the same data brings it down to 93%. Googling a little bit I found
>>>> out that this is happening because of the random_state
>>>> <http://scikit-learn.org/stable/modules/generated/sklearn.utils.check_random_state.html> parameter
>>>> but I am not clear of the details.
>>>>
>>>> Can anyone expand on how is the parameter exactly affecting my training
>>>> and how I can figure out the best value to get the model with best accuracy?
>>>>
>>>> Thanks,
>>>> Abhishek
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170803/4f2e6522/attachment-0001.html>