<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style></head><body style="word-wrap:break-word"><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">Hi Joel,</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto"><br></div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">Yea, seems that the one-hot encoding of the transpose solves the issue. As you say, and as I mentioned to Sebastian, it seems a bit off-usage for OneHotEncoder. </div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto"><br></div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">Thanks for the solution all the same though.</div> <br> <div id="bloop_sign_1474330553229102080" class="bloop_sign"><div style="font-family:helvetica,arial;font-size:13px">-- <br>Lee Zamparo<br></div></div> <br><p class="airmail_on">On September 19, 2016 at 7:48:15 PM, Joel Nothman (<a href="mailto:joel.nothman@gmail.com">joel.nothman@gmail.com</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span><div><div></div><div>
<title></title>
<div dir="ltr">OneHotCoder has issues, but I think all you want
here is
<div><br></div>
<div><span style="font-size:12.8px">ohe.fit_transform(np.transpose(</span><span style="font-size:12.8px">le.fit_transform([c
for c in myguide])))</span><br style="font-size:12.8px"></div>
<div><span style="font-size:12.8px"><br></span></div>
<div><span style="font-size:12.8px">Still, this seems like it is
far from the intended use of OneHotEncoder (which should not really
be stacked with LabelEncoder), so it's not surprising it's
tricky.</span></div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 20 September 2016 at 08:07, Sebastian
Raschka <span dir="ltr"><<a href="mailto:se.raschka@gmail.com" target="_blank">se.raschka@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,
Lee,<br>
<br>
maybe set `n_value=4`, this seems to do the job. I think the
problem you encountered is due to the fact that the one-hot encoder
infers the number of values for each feature (column) from the
dataset. In your case, each column had only 1 unique feature in
your example<br>
<span class=""><br>
> array([[0, 1, 2, 3],<br>
> [0, 1, 2, 3],<br>
> [0, 1, 2, 3]])<br>
<br></span>If you had an array like<br>
<br>
> array([[0],<br>
> [1],<br>
> [2],<br>
> [3]])<br>
<br>
it should work though. Alternatively, set n_values to 4:<br>
<br>
<br>
> >>> from sklearn.preprocessing import
OneHotEncoder<br>
> >>> import numpy as np<br>
><br>
> >>> enc = OneHotEncoder(n_values=4)<br>
> >>> X = np.array([[0, 1, 2, 3]])<br>
> >>> enc.fit_transform(X).toarray()<br>
<br>
<br>
array([[ 1., 0., 0., 0., 0.,
1., 0., 0., 0., 0., 1.,
0., 0.,<br>
0., 0., 1.]])<br>
<br>
and<br>
<br>
> X2 = np.array([[0, 1, 2, 3],<br>
<span class="">>
[0, 1, 2, 3],<br>
> [0, 1,
2, 3]])<br>
><br></span>> enc.transform(X2).toarray()<br>
<br>
<br>
<br>
array([[ 1., 0., 0., 0., 0.,
1., 0., 0., 0., 0., 1.,
0., 0.,<br>
0., 0., 1.],<br>
[ 1., 0., 0.,
0., 0., 1., 0., 0., 0.,
0., 1., 0., 0.,<br>
0., 0., 1.],<br>
[ 1., 0., 0.,
0., 0., 1., 0., 0., 0.,
0., 1., 0., 0.,<br>
0., 0., 1.]])<br>
<br>
<br>
Best,<br>
Sebastian<br>
<div>
<div class="h5"><br>
<br>
> On Sep 19, 2016, at 5:45 PM, Lee Zamparo <<a href="mailto:zamparo@gmail.com">zamparo@gmail.com</a>> wrote:<br>
><br>
> Hi sklearners,<br>
><br>
> A lab-mate came to me with a problem about encoding DNA
sequences using preprocessing.OneHotEncoder, and I find it to
produce confusing results.<br>
><br>
> Suppose I have a DNA string: myguide = ‘ACGT’<br>
><br>
> He’d like use OneHotEncoder to transform DNA strings,
character by character, into a one hot encoded representation like
this: [[1,0,0,0], [0,1,0,0], [0,0,1,0], [0,0,0,1]]. The
use-case seems to be solved in pandas using the dubiously named
get_dummies method (<a href="http://pandas.pydata.org/pandas-docs/version/0.13.1/generated/pandas.get_dummies.html" rel="noreferrer" target="_blank">http://pandas.pydata.org/<wbr>pandas-docs/version/0.13.1/<wbr>generated/pandas.get_dummies.<wbr>html</a>).
I thought that it would be trivial to do with OneHotEncoder, but it
seems strangely difficult:<br>
><br>
> In [23]: myarray = le.fit_transform([c for c in
myguide])<br>
><br>
> In [24]: myarray<br>
> Out[24]: array([0, 1, 2, 3])<br>
><br>
> In [27]: myarray = le.transform([[c for c in myguide],[c for c
in myguide],[c for c in myguide]])<br>
><br>
> In [28]: myarray<br>
> Out[28]:<br>
> array([[0, 1, 2, 3],<br>
> [0, 1, 2, 3],<br>
> [0, 1, 2, 3]])<br>
><br>
> In [29]: ohe.fit_transform(myarray)<br>
> Out[29]:<br>
> array([[ 1., 1., 1., 1.],<br>
> [ 1., 1., 1.,
1.],<br>
> [ 1., 1., 1.,
1.]]) <— ????<br>
><br>
> So this is not at all what I expected. I read the
documentation for OneHotEncoder (<a href="http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder" rel="noreferrer" target="_blank">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.preprocessing.<wbr>OneHotEncoder.html#sklearn.<wbr>preprocessing.OneHotEncoder</a>),
but did not find if clear how it worked (also I found the example
using integers confusing). Neither FeatureHasher nor
DictVectorizer seem to be more appropriate for transforming strings
into positional OneHot encoded arrays. Am I missing
something, or is this operation not supported in sklearn?<br>
><br>
> Thanks,<br>
><br>
> --<br>
> Lee Zamparo<br></div>
</div>
> ______________________________<wbr>_________________<br>
> scikit-learn mailing list<br>
> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
<br>
______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
</blockquote>
</div>
<br></div>
_______________________________________________
<br>scikit-learn mailing list
<br><a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>
<br><a href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>
<br></div></div></span></blockquote></body></html>