<div dir="ltr">Hi,<div><br></div><div>About your question on how to learn the parameters of anomaly detection algorithms using only the negative samples in your case, Nicolas and I worked on this aspect recently. If you are interested you can have look at:</div><div><br></div><div>- Learning hyperparameters for unsupervised anomaly detection: <a href="https://drive.google.com/file/d/0B8Dg3PBX90KNUTg5NGNOVnFPX0hDNmJsSTcybzZMSHNPYkd3/view">https://drive.google.com/file/d/0B8Dg3PBX90KNUTg5NGNOVnFPX0hDNmJsSTcybzZMSHNPYkd3/view</a></div><div>- How to evaluate the quality of unsupervised anomaly Detection algorithms?:</div><div><a href="https://drive.google.com/file/d/0B8Dg3PBX90KNenV3WjRkR09Bakx5YlNyMF9BUXVNem1hb0NR/view">https://drive.google.com/file/d/0B8Dg3PBX90KNenV3WjRkR09Bakx5YlNyMF9BUXVNem1hb0NR/view</a> </div><div><br></div><div>Best,</div><div>Albert</div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, Aug 5, 2016 at 9:34 PM Sebastian Raschka <<a href="mailto:mail@sebastianraschka.com">mail@sebastianraschka.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">> But this might be the kind of problem where you seriously ask how hard it would be to gather more data.<br>

<br>

<br>

Yeah, I agree, but this scenario is then typical in a sense of that it is an anomaly detection problem rather than a classification problem. I.e., you don’t have enough positive labels to fit the model and thus you need to do unsupervised learning to learn from the negative class only.<br>

<br>

Sure, supervised learning could work well, but I would also explore unsupervised learning here and see how that works for you; maybe one-class SVM as suggested or EM algorithm based mixture models (<a href="http://scikit-learn.org/stable/modules/mixture.html" rel="noreferrer" target="_blank">http://scikit-learn.org/stable/modules/mixture.html</a>)<br>

<br>

Best,<br>

Sebastian<br>

<br>

> On Aug 5, 2016, at 2:55 PM, Jared Gabor <<a href="mailto:jgabor.astro@gmail.com" target="_blank">jgabor.astro@gmail.com</a>> wrote:<br>

><br>

> Lots of great suggestions on how to model your problem.  But this might be the kind of problem where you seriously ask how hard it would be to gather more data.<br>

><br>

> On Thu, Aug 4, 2016 at 2:17 PM, Amita Misra <<a href="mailto:amisra2@ucsc.edu" target="_blank">amisra2@ucsc.edu</a>> wrote:<br>

> Hi,<br>

><br>

> I am currently exploring the problem of speed bump detection using accelerometer time series data.<br>

> I have extracted some features based on mean, std deviation etc  within a time window.<br>

><br>

> Since the dataset is highly skewed ( I have just 5  positive samples for every > 300 samples)<br>

> I was looking into<br>

><br>

> One ClassSVM<br>

> covariance.EllipticEnvelope<br>

> sklearn.ensemble.IsolationForest<br>

> but I am not sure how to use them.<br>

><br>

> What I get from docs<br>

><br>

> separate the positive examples and train using only negative examples<br>

> clf.fit(X_train)<br>

> and then<br>

> predict the positive examples using<br>

> clf.predict(X_test)<br>

><br>

><br>

> I am not sure what is then the role of positive examples in my training dataset or how can I use them to improve my classifier so that I can predict better on new samples.<br>

><br>

><br>

> Can we do something like Cross validation to learn the parameters as in normal binary SVM classification<br>

><br>

> Thanks,?<br>

> Amita<br>

><br>

> Amita Misra<br>

> Graduate Student Researcher<br>

> Natural Language and Dialogue Systems Lab<br>

> Baskin School of Engineering<br>

> University of California Santa Cruz<br>

><br>

><br>

><br>

><br>

><br>

> --<br>

> Amita Misra<br>

> Graduate Student Researcher<br>

> Natural Language and Dialogue Systems Lab<br>

> Baskin School of Engineering<br>

> University of California Santa Cruz<br>

><br>

><br>

> _______________________________________________<br>

> scikit-learn mailing list<br>

> <a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>

><br>

><br>

> _______________________________________________<br>

> scikit-learn mailing list<br>

> <a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>

<br>

_______________________________________________<br>

scikit-learn mailing list<br>

<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>

</blockquote></div>