[scikit-learn] Help with data parsing (link to stack exchange question)

Jacob Schreiber jmschreiber91 at gmail.com
Wed Jun 14 12:15:08 EDT 2017

It's unclear to me what exactly you want to do with the classification
algorithm. Is your goal to take in a binary data matrix indicating the
presence of certain k-mers and predict whether the the present k-mers
indicate a susceptible or resistant genome? If so, then you need to convert
your sequence into this binary matrix (or possibly count matrix if you
think counts are more important) such that each row indicates a genome and
each column corresponds to a k-mer. I don't think scikit-learn has any
built-in tools for turning a string into a k-mer encoding (possible future
PR?) so you'd have to do this manually. Let me know if that answered your

On Tue, Jun 13, 2017 at 12:36 PM, Daniel Harris <daphilip at umich.edu> wrote:

> Hello,
> I hope this is the correct email address for questions regarding support.
> I posted my question here on stack exchange:
> https://bioinformatics.stackexchange.com/q/702/842
> Thank you,
> Daniel
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170614/df8a7f07/attachment.html>

More information about the scikit-learn mailing list