[scikit-learn] One-hot encoding
Sarah Wait Zaranek
sarah.zaranek at gmail.com
Mon Feb 5 00:31:19 EST 2018
If I use the n+1 approach, then I get the correct matrix, except with the
columns of zeros:
>>> test
array([[0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 1.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0.]])
On Mon, Feb 5, 2018 at 12:25 AM, Sarah Wait Zaranek <sarah.zaranek at gmail.com
> wrote:
> Hi Joel -
>
> Conceptually, that makes sense. But when I assign n_values, I can't make
> it match the result when you don't specify them. See below. I used the
> number of unique levels per column.
>
> >>> enc = OneHotEncoder(sparse=False)
> >>> test = enc.fit_transform([[7, 0, 3], [1, 2, 0], [0, 2, 1], [1, 0, 2]])
> >>> test
> array([[0., 0., 1., 1., 0., 0., 0., 0., 1.],
> [0., 1., 0., 0., 1., 1., 0., 0., 0.],
> [1., 0., 0., 0., 1., 0., 1., 0., 0.],
> [0., 1., 0., 1., 0., 0., 0., 1., 0.]])
> >>> enc = OneHotEncoder(sparse=False,n_values=[3,2,4])
> >>> test = enc.fit_transform([[7, 0, 3], [1, 2, 0], [0, 2, 1], [1, 0, 2]])
> >>> test
> array([[0., 0., 0., 1., 0., 0., 0., 1., 1.],
> [0., 1., 0., 0., 0., 2., 0., 0., 0.],
> [1., 0., 0., 0., 0., 1., 1., 0., 0.],
> [0., 1., 0., 1., 0., 0., 0., 1., 0.]])
>
> Cheers,
> Sarah
>
> Cheers,
> Sarah
>
> On Mon, Feb 5, 2018 at 12:02 AM, Joel Nothman <joel.nothman at gmail.com>
> wrote:
>
>> If each input column is encoded as a value from 0 to the (number of
>> possible values for that column - 1) then n_values for that column should
>> be the highest value + 1, which is also the number of levels per column.
>> Does that make sense?
>>
>> Actually, I've realised there's a somewhat slow and unnecessary bit of
>> code in the one-hot encoder: where the COO matrix is converted to CSR. I
>> suspect this was done because most of our ML algorithms perform better on
>> CSR, or else to maintain backwards compatibility with an earlier
>> implementation.
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180205/ed1136f0/attachment.html>
More information about the scikit-learn
mailing list