[scikit-learn] Supervised anomaly detection in time series

Nicolas Goix goix.nicolas at gmail.com
Thu Aug 4 20:23:28 EDT 2016


You can evaluate the accuracy of your hyper-parameters on a few samples.
Just don't use the accuracy as your performance measure.

For supervised classification, training multiple algorithms on small
balanced subsamples usually works well, but 5 anomalies seems indeed to be
very little.

Nicolas

On Aug 4, 2016 7:51 PM, "Amita Misra" <amisra2 at ucsc.edu> wrote:

> SubSample would remove a lot of information from the negative class.
> I have more than 500 samples of negative class and just 5 samples of
> positive class.
>
> Amita
>
> On Thu, Aug 4, 2016 at 4:43 PM, Nicolas Goix <goix.nicolas at gmail.com>
> wrote:
>
>> Hi,
>>
>> Yes you can use your labeled data (you will need to sub-sample your
>> normal class to have similar proportion normal-abnormal) to learn your
>> hyper-parameters through CV.
>>
>> You can also try to use supervised classification algorithms on `not too
>> highly unbalanced' sub-samples.
>>
>> Nicolas
>>
>> On Thu, Aug 4, 2016 at 5:17 PM, Amita Misra <amisra2 at ucsc.edu> wrote:
>>
>>> Hi,
>>>
>>> I am currently exploring the problem of speed bump detection using
>>> accelerometer time series data.
>>> I have extracted some features based on mean, std deviation etc  within
>>> a time window.
>>>
>>> Since the dataset is highly skewed ( I have just 5  positive samples for
>>> every > 300 samples)
>>> I was looking into
>>>
>>> One ClassSVM
>>> covariance.EllipticEnvelope
>>> sklearn.ensemble.IsolationForest
>>>
>>> but I am not sure how to use them.
>>>
>>> What I get from docs
>>> separate the positive examples and train using only negative examples
>>>
>>> clf.fit(X_train)
>>>
>>> and then
>>> predict the positive examples using
>>> clf.predict(X_test)
>>>
>>>
>>> I am not sure what is then the role of positive examples in my training
>>> dataset or how can I use them to improve my classifier so that I can
>>> predict better on new samples.
>>>
>>>
>>> Can we do something like Cross validation to learn the parameters as in
>>> normal binary SVM classification
>>>
>>> Thanks,?
>>> Amita
>>>
>>> Amita Misra
>>> Graduate Student Researcher
>>> Natural Language and Dialogue Systems Lab
>>> Baskin School of Engineering
>>> University of California Santa Cruz
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Amita Misra
>>> Graduate Student Researcher
>>> Natural Language and Dialogue Systems Lab
>>> Baskin School of Engineering
>>> University of California Santa Cruz
>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
>
> --
> Amita Misra
> Graduate Student Researcher
> Natural Language and Dialogue Systems Lab
> Baskin School of Engineering
> University of California Santa Cruz
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160804/5dd1d770/attachment.html>


More information about the scikit-learn mailing list