[scikit-learn] Supervised anomaly detection in time series

Fri Aug 5 19:40:30 EDT 2016

Hi,

About your question on how to learn the parameters of anomaly detection
algorithms using only the negative samples in your case, Nicolas and I
worked on this aspect recently. If you are interested you can have look at:

- Learning hyperparameters for unsupervised anomaly detection:
https://drive.google.com/file/d/0B8Dg3PBX90KNUTg5NGNOVnFPX0hDNmJsSTcybzZMSHNPYkd3/view
- How to evaluate the quality of unsupervised anomaly Detection algorithms?:
https://drive.google.com/file/d/0B8Dg3PBX90KNenV3WjRkR09Bakx5YlNyMF9BUXVNem1hb0NR/view

Best,
Albert

On Fri, Aug 5, 2016 at 9:34 PM Sebastian Raschka <mail at sebastianraschka.com>
wrote:

> > But this might be the kind of problem where you seriously ask how hard
> it would be to gather more data.
>
>
> Yeah, I agree, but this scenario is then typical in a sense of that it is
> an anomaly detection problem rather than a classification problem. I.e.,
> you don’t have enough positive labels to fit the model and thus you need to
> do unsupervised learning to learn from the negative class only.
>
> Sure, supervised learning could work well, but I would also explore
> unsupervised learning here and see how that works for you; maybe one-class
> SVM as suggested or EM algorithm based mixture models (
> http://scikit-learn.org/stable/modules/mixture.html)
>
> Best,
> Sebastian
>
> > On Aug 5, 2016, at 2:55 PM, Jared Gabor <jgabor.astro at gmail.com> wrote:
> >
> > Lots of great suggestions on how to model your problem.  But this might
> be the kind of problem where you seriously ask how hard it would be to
> gather more data.
> >
> > On Thu, Aug 4, 2016 at 2:17 PM, Amita Misra <amisra2 at ucsc.edu> wrote:
> > Hi,
> >
> > I am currently exploring the problem of speed bump detection using
> accelerometer time series data.
> > I have extracted some features based on mean, std deviation etc  within
> a time window.
> >
> > Since the dataset is highly skewed ( I have just 5  positive samples for
> every > 300 samples)
> > I was looking into
> >
> > One ClassSVM
> > covariance.EllipticEnvelope
> > sklearn.ensemble.IsolationForest
> > but I am not sure how to use them.
> >
> > What I get from docs
> >
> > separate the positive examples and train using only negative examples
> > clf.fit(X_train)
> > and then
> > predict the positive examples using
> > clf.predict(X_test)
> >
> >
> > I am not sure what is then the role of positive examples in my training
> dataset or how can I use them to improve my classifier so that I can
> predict better on new samples.
> >
> >
> > Can we do something like Cross validation to learn the parameters as in
> normal binary SVM classification
> >
> > Thanks,?
> > Amita
> >
> > Amita Misra
> > Graduate Student Researcher
> > Natural Language and Dialogue Systems Lab
> > Baskin School of Engineering
> > University of California Santa Cruz
> >
> >
> >
> >
> >
> > --
> > Amita Misra
> > Graduate Student Researcher
> > Natural Language and Dialogue Systems Lab
> > Baskin School of Engineering
> > University of California Santa Cruz
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160805/2d66f633/attachment-0001.html>