[scikit-learn] Multiple normal scenario for OCSVM

Wed Apr 5 23:06:29 EDT 2017

Hi Albert,

Thank you for replying.

You are right, a high FPR might indicate an overfitting problem.

I have been having discussions with friends and our insight so far is
that I was worrying a non-existent problem. Feeding two dataset of
both 'Normal Classes' into the 'decision_function' of OCSVM and read
its AUROC would not give any info on the quality of Anomaly Detector.
A meaningful reading only if feeding it with the 'normal' class and
the 'anomaly' class.

Again thank you for your kind reply.

Best regards,
Ady

On 4/6/17, Albert Thomas <albertthomas88 at gmail.com> wrote:
> Hi Ady,
>
> Overfitting is a possible explanation. If your model learnt your normal
> scenarios too well then every abnormal data will be predicted as abnormal
> (so you will have a good performance  for anomalies) however none of the
> normal instances of the test set will be in the normal region (so you will
> have a high FPR).
>
> Albert
>
> On Wed, 5 Apr 2017 at 15:37, Ady Wahyudi Paundu <awpaundu at gmail.com> wrote:
>
>> Good day Scikit-Learn Masters,
>>
>> I have used Scikit-Learns OCSVM module previously with satisfying
>> results.
>> However on my current tasks I have this problem for one-class analysis:
>>
>> In my previous cases, I used OCSVM for Anomaly detector, and the
>> normal classes in each cases were coming from one scenario.
>> Now, I want to create one Anomaly detector system, with multiple
>> normal scenario (in this case, 3 different normal scenario). Lets say
>> I have scenario A, B and C, and I want to distinguish all data that is
>> not coming from A and B and C.
>> What I have been tried is combining all training data A and B and C
>> into one data set and fit it using OCSVM module. When I tested the
>> output model to several anomaly data-set it worked good. However, when
>> I tested it against either one of the normal scenario, it gave a very
>> high False Positives (AUROC: 99%).
>>
>> So my question, is it because a bad approach? by combining all the
>> different normal data set into one training data set.
>> Or is it because I was using it (the OCSVM) wrong? (i use 'rbf' kernel
>> with nu and gamma set to 0.001)
>> Or is it the case with wrong tools? another algorithm perhaps?
>>
>> I dont know if this is a proper question to ask here, so if it is not
>> (maybe because this is just a Machine Learning question in general),
>> just disregard it.
>>
>> Thank you in advance
>>
>> Best regards,
>> Ady
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>