[scikit-learn] cross_validate() with HMM

Anni Bauer anni-bauer at outlook.com
Wed Feb 13 08:09:36 EST 2019


Hi! I want to be able to run each fold of a k-fold cross validation fold in parallel, using all of my 6 CPUs at once. My model is a hidden markov model and I want to train it using the training portion of the data and then extract the anomaly score (negative log-likelihood) of each test sequence of the test portion with every fold and use ROC as an evaluation technique with every fold.

I have found the function cross_validate() which seems to provide the option of running things in parralel with n_jobs = -1.
I assume the estimator is then my HMM model.
As of now I'm using pomegranate to train the model and extract the anomaly score of the test sequences.
I don't understand how to call the cross_validate function with the right arguments for my HMM model. All examples I've seen havn't used HMM. I'm confused on where to specify the hidden states number if Im not callign my usual pomegranate function from_samples(), which I've used before.

Also how can I extract the anomay scores within each fold using this function?
I'm unsure what exactly is happening with in the cross_validate function and how to control it the way I need.

If anyone has an example or explanation or another idea on how to run the folds in parallel, I would really appreciate it!

This is my attempt of using cross_validate, which gets stuck or seems to not be running through (although I'm quite sure I'm not using it properly):

import pomegranate
import sklearn
model = pomegranate.HiddenMarkovModel()

results = cross_validate(model, listToUse, y=None, groups=None, scoring=None, cv=3, n_jobs=-1, verbose=10)

print(results)


Below is how I've manually set my cross-validation up as of now:

listExample = []
kfold = KFold(10, True)
for train, test in kfold.split(listToUse):
    listExample.append([listToUse[train], listToUse[test]])

scoreList = []

for ex in listExample:

    hmmModel = hmm.hmm(ex[0])
    scoreListFold = []

    mid = time.time()

    for li in ex[1]:
        prob = hmmModel.log_probability(li)
        scoreListFold.append(prob)

    scoreList.append(numpy.mean(scoreListFold))

avg = numpy.mean(scoreList)

Thanks again!

Anni
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190213/d32b8036/attachment.html>


More information about the scikit-learn mailing list