[scikit-learn] Understanding sklearn.tree._tree.value object
Adrin
adrin.jalali at gmail.com
Tue Oct 9 03:43:52 EDT 2018
I'm not sure if that would make sense. If during the training, you tell the
model there's only one class for a column then the model only knows that.
In your case, if all samples belong to class 1 in the training data, then
as far as the model is concerned, all samples belong to class 1. If you
want to interpret the results, you can combine the infor you get from
`clf.tree_.value` with `clf.classes_`, and then you should be fine.
On Tue, 9 Oct 2018 at 05:27 Pranav Ashok <pranavashok at gmail.com> wrote:
> Hi Adrin,
>
> Thanks for the clarification. Is there a right way of letting
> DecisionTreeClassifier know that the first column can take both 0 or 1, but
> in the current dataset we are only using 0?
>
> For example, we can let MultiLabelBinarizer know that we have three
> classes by instantiating it like this: MultiLabelBinarizer([1,2,3]).
>
> I tried class_weight=[{0: 1, 1: 1}, {0: 1, 1: 1}, {0: 1, 1: 1}] but that
> doesn't work.
>
> Thanks,
> Pranav
>
> On Mon, Oct 8, 2018 at 2:32 PM Adrin <adrin.jalali at gmail.com> wrote:
>
>> Hi Pranav,
>>
>> The reason you're getting that output is that your first column has a
>> single value (1), and that becomes your "first" class, hence your first
>> value in the rows you're interpreting.
>>
>> To understand it better, you can try to check this code:
>>
>> >>> from sklearn.preprocessing import MultiLabelBinarizer
>> >>> from sklearn.tree import DecisionTreeClassifier
>> >>>
>> >>> X = [[2, 51], [3, 20], [5, 30], [7, 1], [20, 46], [25, 25], [45, 70]]
>> >>> Y = [[2,3],[1,2,3],[1,2,3],[1,2],[1,2],[1],[1]]
>> >>>
>> >>> y = MultiLabelBinarizer().fit_transform(Y) + 40
>> >>> y[0, 1] = 0
>> >>>
>> >>> clf = DecisionTreeClassifier().fit(X, y)
>> >>> print(clf.tree_.value)
>> [[[1. 6. 0.]
>> [1. 2. 4.]
>> [4. 3. 0.]]
>>
>> [[1. 2. 0.]
>> [1. 0. 2.]
>> [0. 3. 0.]]
>>
>> [[0. 2. 0.]
>> [0. 0. 2.]
>> [0. 2. 0.]]
>>
>> [[1. 0. 0.]
>> [1. 0. 0.]
>> [0. 1. 0.]]
>>
>> [[0. 4. 0.]
>> [0. 2. 2.]
>> [4. 0. 0.]]
>>
>> [[0. 2. 0.]
>> [0. 0. 2.]
>> [2. 0. 0.]]
>>
>> [[0. 2. 0.]
>> [0. 2. 0.]
>> [2. 0. 0.]]]
>>
>>
>> On Mon, 8 Oct 2018 at 20:53 Pranav Ashok <pranavashok at gmail.com> wrote:
>>
>>> I have a multi-class multi-label decision tree learnt using
>>> DecisionTreeClassifier class. The input looks like follows:
>>>
>>> X = [[2, 51], [3, 20], [5, 30], [7, 1], [20, 46], [25, 25], [45, 70]]
>>> Y = [[1,2,3],[1,2,3],[1,2,3],[1,2],[1,2],[1],[1]]
>>>
>>> I have used MultiLabelBinarizer to convert Y into
>>>
>>> [[1 1 1]
>>> [1 1 1]
>>> [1 1 1]
>>> [1 1 0]
>>> [1 1 0]
>>> [1 0 0]
>>> [1 0 0]]
>>>
>>>
>>> After training, the _tree.values looks like follows:
>>>
>>> array([[[7., 0.],
>>> [2., 5.],
>>> [4., 3.]],
>>>
>>> [[3., 0.],
>>> [0., 3.],
>>> [0., 3.]],
>>>
>>> [[4., 0.],
>>> [2., 2.],
>>> [4., 0.]],
>>>
>>> [[2., 0.],
>>> [0., 2.],
>>> [2., 0.]],
>>>
>>> [[2., 0.],
>>> [2., 0.],
>>> [2., 0.]]])
>>>
>>> I had the impression that the value array contains for each node, a list of lists [[n_1, y_1], [n_2, y_2], [n_3, y_3]]
>>> such that n_i are the number of samples disagreeing with class i and y_i are the number of samples agreeing with
>>> class i. But after seeing this output, it does not make sense.
>>>
>>> For example, the root node has the value [[7,0],[2,5],[4,3]]. According to my interpretation, this would mean
>>> 7 samples disagree with class 1; 2 disagree with class 2 and 5 agree with class 2; 4 disagree with class 3 and 3 agree with class 3.
>>>
>>> which, according to the input dataset is wrong.
>>>
>>> Could someone please help me understand the semantics of _tree.value for multi-label DTs?
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20181009/28111394/attachment.html>
More information about the scikit-learn
mailing list