<div><div class="gmail_msg" style="color:rgb(49,49,49);word-spacing:1px">Hi Ady,</div><div class="gmail_msg" style="color:rgb(49,49,49);word-spacing:1px"><br class="gmail_msg"></div><div class="gmail_msg" style="color:rgb(49,49,49);word-spacing:1px">Overfitting is a possible explanation. If your model learnt your normal scenarios too well then every abnormal data will be predicted as abnormal (so you will have a good performance for anomalies) however none of the normal instances of the test set will be in the normal region (so you will have a high FPR).</div><div class="gmail_msg" style="color:rgb(49,49,49);word-spacing:1px"><br></div><div class="gmail_msg" style="color:rgb(49,49,49);word-spacing:1px">Albert</div><div class="gmail_msg" style="color:rgb(49,49,49);word-spacing:1px"><br></div><div class="gmail_quote"><div>On Wed, 5 Apr 2017 at 15:37, Ady Wahyudi Paundu <<a href="mailto:awpaundu@gmail.com">awpaundu@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Good day Scikit-Learn Masters,<br class="gmail_msg">
<br class="gmail_msg">
I have used Scikit-Learns OCSVM module previously with satisfying results.<br class="gmail_msg">
However on my current tasks I have this problem for one-class analysis:<br class="gmail_msg">
<br class="gmail_msg">
In my previous cases, I used OCSVM for Anomaly detector, and the<br class="gmail_msg">
normal classes in each cases were coming from one scenario.<br class="gmail_msg">
Now, I want to create one Anomaly detector system, with multiple<br class="gmail_msg">
normal scenario (in this case, 3 different normal scenario). Lets say<br class="gmail_msg">
I have scenario A, B and C, and I want to distinguish all data that is<br class="gmail_msg">
not coming from A and B and C.<br class="gmail_msg">
What I have been tried is combining all training data A and B and C<br class="gmail_msg">
into one data set and fit it using OCSVM module. When I tested the<br class="gmail_msg">
output model to several anomaly data-set it worked good. However, when<br class="gmail_msg">
I tested it against either one of the normal scenario, it gave a very<br class="gmail_msg">
high False Positives (AUROC: 99%).<br class="gmail_msg">
<br class="gmail_msg">
So my question, is it because a bad approach? by combining all the<br class="gmail_msg">
different normal data set into one training data set.<br class="gmail_msg">
Or is it because I was using it (the OCSVM) wrong? (i use 'rbf' kernel<br class="gmail_msg">
with nu and gamma set to 0.001)<br class="gmail_msg">
Or is it the case with wrong tools? another algorithm perhaps?<br class="gmail_msg">
<br class="gmail_msg">
I dont know if this is a proper question to ask here, so if it is not<br class="gmail_msg">
(maybe because this is just a Machine Learning question in general),<br class="gmail_msg">
just disregard it.<br class="gmail_msg">
<br class="gmail_msg">
Thank you in advance<br class="gmail_msg">
<br class="gmail_msg">
Best regards,<br class="gmail_msg">
Ady<br class="gmail_msg">
_______________________________________________<br class="gmail_msg">
scikit-learn mailing list<br class="gmail_msg">
<a href="mailto:scikit-learn@python.org" class="gmail_msg" target="_blank">scikit-learn@python.org</a><br class="gmail_msg">
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" class="gmail_msg" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br class="gmail_msg">
</blockquote></div></div>