<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style></head><body style="word-wrap:break-word"><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">Hi Joel,</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto"><br></div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">Yea, seems that the one-hot encoding of the transpose solves the issue.  As you say, and as I mentioned to Sebastian, it seems a bit off-usage for OneHotEncoder.  </div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto"><br></div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">Thanks for the solution all the same though.</div> <br> <div id="bloop_sign_1474330553229102080" class="bloop_sign"><div style="font-family:helvetica,arial;font-size:13px">-- <br>Lee Zamparo<br></div></div> <br><p class="airmail_on">On September 19, 2016 at 7:48:15 PM, Joel Nothman (<a href="mailto:joel.nothman@gmail.com">joel.nothman@gmail.com</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span><div><div></div><div>


<title></title>


<div dir="ltr">OneHotCoder has issues, but I think all you want

here is

<div><br></div>

<div><span style="font-size:12.8px">ohe.fit_transform(np.transpose(</span><span style="font-size:12.8px">le.fit_transform([c

for c in myguide])))</span><br style="font-size:12.8px"></div>

<div><span style="font-size:12.8px"><br></span></div>

<div><span style="font-size:12.8px">Still, this seems like it is

far from the intended use of OneHotEncoder (which should not really

be stacked with LabelEncoder), so it's not surprising it's

tricky.</span></div>

</div>

<div class="gmail_extra"><br>

<div class="gmail_quote">On 20 September 2016 at 08:07, Sebastian

Raschka <span dir="ltr"><<a href="mailto:se.raschka@gmail.com" target="_blank">se.raschka@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,

Lee,<br>

<br>

maybe set `n_value=4`, this seems to do the job. I think the

problem you encountered is due to the fact that the one-hot encoder

infers the number of values for each feature (column) from the

dataset. In your case, each column had only 1 unique feature in

your example<br>

<span class=""><br>

> array([[0, 1, 2, 3],<br>

>        [0, 1, 2, 3],<br>

>        [0, 1, 2, 3]])<br>

<br></span>If you had an array like<br>

<br>

> array([[0],<br>

>           [1],<br>

>           [2],<br>

>          [3]])<br>

<br>

it should work though. Alternatively, set n_values to 4:<br>

<br>

<br>

> >>> from sklearn.preprocessing import

OneHotEncoder<br>

> >>> import numpy as np<br>

><br>

> >>> enc = OneHotEncoder(n_values=4)<br>

> >>> X = np.array([[0, 1, 2, 3]])<br>

> >>> enc.fit_transform(X).toarray()<br>

<br>

<br>

array([[ 1.,  0.,  0.,  0.,  0., 

1.,  0.,  0.,  0.,  0.,  1., 

0.,  0.,<br>

         0.,  0.,  1.]])<br>

<br>

and<br>

<br>

> X2 = np.array([[0, 1, 2, 3],<br>

<span class="">>             

  [0, 1, 2, 3],<br>

>                [0, 1,

2, 3]])<br>

><br></span>> enc.transform(X2).toarray()<br>

<br>

<br>

<br>

array([[ 1.,  0.,  0.,  0.,  0., 

1.,  0.,  0.,  0.,  0.,  1., 

0.,  0.,<br>

         0.,  0.,  1.],<br>

       [ 1.,  0.,  0., 

0.,  0.,  1.,  0.,  0.,  0., 

0.,  1.,  0.,  0.,<br>

         0.,  0.,  1.],<br>

       [ 1.,  0.,  0., 

0.,  0.,  1.,  0.,  0.,  0., 

0.,  1.,  0.,  0.,<br>

         0.,  0.,  1.]])<br>

<br>

<br>

Best,<br>

Sebastian<br>

<div>

<div class="h5"><br>

<br>

> On Sep 19, 2016, at 5:45 PM, Lee Zamparo <<a href="mailto:zamparo@gmail.com">zamparo@gmail.com</a>> wrote:<br>

><br>

> Hi sklearners,<br>

><br>

> A lab-mate came to me with a problem about encoding DNA

sequences using preprocessing.OneHotEncoder, and I find it to

produce confusing results.<br>

><br>

> Suppose I have a DNA string:  myguide = ‘ACGT’<br>

><br>

> He’d like use OneHotEncoder to transform DNA strings,

character by character, into a one hot encoded representation like

this: [[1,0,0,0], [0,1,0,0], [0,0,1,0], [0,0,0,1]].  The

use-case seems to be solved in pandas using the dubiously named

get_dummies method (<a href="http://pandas.pydata.org/pandas-docs/version/0.13.1/generated/pandas.get_dummies.html" rel="noreferrer" target="_blank">http://pandas.pydata.org/<wbr>pandas-docs/version/0.13.1/<wbr>generated/pandas.get_dummies.<wbr>html</a>). 

I thought that it would be trivial to do with OneHotEncoder, but it

seems strangely difficult:<br>

><br>

> In [23]: myarray = le.fit_transform([c for c in

myguide])<br>

><br>

> In [24]: myarray<br>

> Out[24]: array([0, 1, 2, 3])<br>

><br>

> In [27]: myarray = le.transform([[c for c in myguide],[c for c

in myguide],[c for c in myguide]])<br>

><br>

> In [28]: myarray<br>

> Out[28]:<br>

> array([[0, 1, 2, 3],<br>

>        [0, 1, 2, 3],<br>

>        [0, 1, 2, 3]])<br>

><br>

> In [29]: ohe.fit_transform(myarray)<br>

> Out[29]:<br>

> array([[ 1.,  1.,  1.,  1.],<br>

>        [ 1.,  1.,  1., 

1.],<br>

>        [ 1.,  1.,  1., 

1.]])    <— ????<br>

><br>

> So this is not at all what I expected.  I read the

documentation for OneHotEncoder (<a href="http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder" rel="noreferrer" target="_blank">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.preprocessing.<wbr>OneHotEncoder.html#sklearn.<wbr>preprocessing.OneHotEncoder</a>),

but did not find if clear how it worked (also I found the example

using integers confusing).  Neither FeatureHasher nor

DictVectorizer seem to be more appropriate for transforming strings

into positional OneHot encoded arrays.  Am I missing

something, or is this operation not supported in sklearn?<br>

><br>

> Thanks,<br>

><br>

> --<br>

> Lee Zamparo<br></div>

</div>

> ______________________________<wbr>_________________<br>

> scikit-learn mailing list<br>

> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>


<br>

______________________________<wbr>_________________<br>

scikit-learn mailing list<br>

<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

</blockquote>

</div>

<br></div>


_______________________________________________

<br>scikit-learn mailing list

<br><a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>

<br><a href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>

<br></div></div></span></blockquote></body></html>