[scikit-learn] tree visualization with class names in leaves
Sebastian Raschka
mail at sebastianraschka.com
Tue Oct 25 08:45:45 EDT 2016
Hi, Gregory,
> I dont't use the iris dataset. My classes are distributed in my Y array.
Yeah, I just used this as a simple example :).
> the nodes of the graphical tree seem to be filled with the predominant class
I think that’s right, it gets the class name of the majority class at each node via "class_name = class_names[np.argmax(value)]” (https://github.com/scikit-learn/scikit-learn/blob/3a106fc792eb8e70e1fd078e351ba42487d3214d/sklearn/tree/export.py#L286)
> in a vector with the classes in alphabetical order ( the same order as in clf.classes_)
yes, it should be in ascending, alpha numerical order. Not sure if this is still a general recommendation in the sklearn 0.18, but I typically convert string class labels to integers before I feed it to a classifier (but it seems to work either way now)
-> from sklearn.preprocessing import LabelEncoder
-> le = LabelEncoder()
-> y = le.fit_transform(labels)
-> le.classes_
array(['Setosa', 'Versicolor', 'Virginica'],
dtype='<U21’)
-> import numpy as np
-> np.bincount(y)
array([50, 50, 50])
Best,
Sebastian
> On Oct 25, 2016, at 3:00 AM, greg g <greg315 at hotmail.fr> wrote:
>
> Hi Sebastian,
> Thanks for your answer.
> I dont't use the iris dataset. My classes are distributed in my Y array.
> It seems that I can get the classes in alphabetical order with
> > clf.classes_
> where clf is my tree.
> And with
> > export_graphviz(clf, out_file=dot_data,feature_names=FEATURES,class_names=clf.classes_)
> the nodes of the graphical tree seem to be filled with the predominant class and samples repartition in a vector with the classes in alphabetical order ( the same order as in clf.classes_)
> I have to confirm that with more classes.
>
> Regards
> Gregory
>
> De : scikit-learn <scikit-learn-bounces+greg315=hotmail.fr at python.org> de la part de Sebastian Raschka <se.raschka at gmail.com>
> Envoyé : lundi 24 octobre 2016 17:47
> À : Scikit-learn user and developer mailing list
> Objet : Re: [scikit-learn] tree visualization with class names in leaves
>
> Hi, Greg,
> if you provide the `class_names` argument, a “class” label of the majority class will be added at the bottom of each node. For instance, if you have the Iris dataset, with class labels 0, 1, 2, you can provide the `class_names` as ['setosa', 'versicolor', 'virginica’], where 0 -> ‘setosa’, 1 -> ‘versicolor’, 2 -> ‘virginica’.
>
> Best,
> Sebastian
>
> > On Oct 24, 2016, at 10:18 AM, greg g <greg315 at hotmail.fr> wrote:
> >
> > bLaf1ox-forefront-antispam-report: EFV:NLI; SFV:NSPM; SFS:(10019020)(98900003);
> > DIR:OUT; SFP:1102; SCL:1; SRVR:DB5EUR03HT168;
> > H:DB3PR04MB0780.eurprd04.prod.outlook.com; FPR:; SPF:None; LANG:en;
> > x-ms-office365-filtering-correlation-id: 319900b9-973c-49bb-8e9a-08d3fc1895c4
> > x-microsoft-antispam: UriScan:; BCL:0; PCL:0;
> > RULEID:(1601124038)(1603103081)(1601125047); SRVR:DB5EUR03HT168;
> > x-exchange-antispam-report-cfa-test: BCL:0; PCL:0;
> > RULEID:(432015012)(82015046); SRVR:DB5EUR03HT168; BCL:0; PCL:0; RULEID:;
> > SRVR:DB5EUR03HT168;
> > x-forefront-prvs: 0105DAA385
> > X-OriginatorOrg: outlook.com
> > X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Oct 2016 14:18:11.0102 (UTC)
> > X-MS-Exchange-CrossTenant-fromentityheader: Internet
> > X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa
> > X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5EUR03HT168
> >
> >
> > Hi,
> > I just begin with scikit-learn and would like to visualize a classification tree with class names displayed in the leaves as shown in the SCIKITLEARN.TREE documentation http://scikit-learn.org/stable/modules/tree.html where we find class=’virginica’ etc…
>
> 1.10. Decision Trees — scikit-learn 0.18 documentation
> scikit-learn.org
> Decision-tree learners can create over-complex trees that do not generalise the data well. This is called overfitting. Mechanisms such as pruning (not currently ...
>
> > I made a tree providing a 2D array X (n1 samples , n2 features) and 1D array Y (n1 corresponding classes ) such that Y(i) is the class of the sample X(i, …)
> > After that I have correct predictions using predict()
> > Then I use the function
> > export_graphviz(clf, out_file=dot_data,feature_names=FEATURES)
> > with FEATURES being the array of my n2 features names in the same order as in X
> > I obtain the tree .png but can’t find a way to have the correct class names in the leaves…
> > In export_graphviz() should I use the class_names optional parameter and how ?
> > Thanks for any help
> >
> > Gregory, Toulouse FRANCE
> >
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> scikit-learn Info Page - Python
> mail.python.org
> To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> scikit-learn Info Page - Python
> mail.python.org
> To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
More information about the scikit-learn
mailing list