API Discussion: Where shall we put the plotting functions?
See https://github.com/scikit-learn/scikit-learn/issues/13448 We've introduced several plotting functions (e.g., plot_tree and plot_partial_dependence) and will introduce more (e.g., plot_decision_boundary) in the future. Consequently, we need to decide where to put these functions. Currently, there're 3 proposals: (1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree) (2) sklearn.plot.XXX.plot_YYY (e.g., sklearn.plot.tree.plot_tree) (3) sklearn.XXX.plot.plot_YYY (e.g., sklearn.tree.plot.plot_tree, note that we won't support from sklearn.XXX import plot_YYY) Joel Nothman, Gael Varoquaux and I decided to post it on the mailing list to invite opinions. Thanks Hanmin Qin
As a user, I feel that (2) "sklearn.plot.XXX.plot_YYY" best allows for future expansion of sub-namespaces in a tractable way that is also easy to understand during code review. For example, sklearn.plot.tree.plot_forest() or sklearn.plot.lasso.plot_* . Just my opinion. J.B. 2019年4月2日(火) 23:40 Hanmin Qin <qinhanmin2005@sina.com>:
See https://github.com/scikit-learn/scikit-learn/issues/13448
We've introduced several plotting functions (e.g., plot_tree and plot_partial_dependence) and will introduce more (e.g., plot_decision_boundary) in the future. Consequently, we need to decide where to put these functions. Currently, there're 3 proposals:
(1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree)
(2) sklearn.plot.XXX.plot_YYY (e.g., sklearn.plot.tree.plot_tree)
(3) sklearn.XXX.plot.plot_YYY (e.g., sklearn.tree.plot.plot_tree, note that we won't support from sklearn.XXX import plot_YYY)
Joel Nothman, Gael Varoquaux and I decided to post it on the mailing list to invite opinions.
Thanks
Hanmin Qin _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
My preference would be for (1). I don't think the sub-namespace in (2) is necessary, and don't like (3), as I would prefer the plotting functions to be all in the same namespace sklearn.plot. Andrew <~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile <http://www.linkedin.com/in/ahowe42> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/> Open Researcher and Contributor ID (ORCID) <http://orcid.org/0000-0002-3553-1990> Github Profile <http://github.com/ahowe42> Personal Website <http://www.andrewhowe.com> I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~> On Tue, Apr 2, 2019 at 3:40 PM Hanmin Qin <qinhanmin2005@sina.com> wrote:
See https://github.com/scikit-learn/scikit-learn/issues/13448
We've introduced several plotting functions (e.g., plot_tree and plot_partial_dependence) and will introduce more (e.g., plot_decision_boundary) in the future. Consequently, we need to decide where to put these functions. Currently, there're 3 proposals:
(1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree)
(2) sklearn.plot.XXX.plot_YYY (e.g., sklearn.plot.tree.plot_tree)
(3) sklearn.XXX.plot.plot_YYY (e.g., sklearn.tree.plot.plot_tree, note that we won't support from sklearn.XXX import plot_YYY)
Joel Nothman, Gael Varoquaux and I decided to post it on the mailing list to invite opinions.
Thanks
Hanmin Qin _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
I think #1 if any of these... Plotting functions should hopefully be as general as possible, so tagging with a specific type of estimator will, in some scikit-learn utopia, be unnecessary. If a general plotter is built, where does it live in other estimator-specific namespace options? Feels awkward to put it under every estimator's namespace. Then again, there might be a #4 where there is no plot module and plotting classes live under groups of utilities like introspection, cross-validation or something?... On Wed, Apr 3, 2019 at 8:54 PM Andrew Howe <ahowe42@gmail.com> wrote:
My preference would be for (1). I don't think the sub-namespace in (2) is necessary, and don't like (3), as I would prefer the plotting functions to be all in the same namespace sklearn.plot.
Andrew
<~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile <http://www.linkedin.com/in/ahowe42> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/> Open Researcher and Contributor ID (ORCID) <http://orcid.org/0000-0002-3553-1990> Github Profile <http://github.com/ahowe42> Personal Website <http://www.andrewhowe.com> I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
On Tue, Apr 2, 2019 at 3:40 PM Hanmin Qin <qinhanmin2005@sina.com> wrote:
See https://github.com/scikit-learn/scikit-learn/issues/13448
We've introduced several plotting functions (e.g., plot_tree and plot_partial_dependence) and will introduce more (e.g., plot_decision_boundary) in the future. Consequently, we need to decide where to put these functions. Currently, there're 3 proposals:
(1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree)
(2) sklearn.plot.XXX.plot_YYY (e.g., sklearn.plot.tree.plot_tree)
(3) sklearn.XXX.plot.plot_YYY (e.g., sklearn.tree.plot.plot_tree, note that we won't support from sklearn.XXX import plot_YYY)
Joel Nothman, Gael Varoquaux and I decided to post it on the mailing list to invite opinions.
Thanks
Hanmin Qin _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
+1 for options 1 and +0.5 for 3. Do we anticipate that many plotting functions will be added? If it's just a dozen or less, putting them all into a single namespace sklearn.plot might be easier. This also would avoid discussion about where to put some generic plotting functions (e.g. https://github.com/scikit-learn/scikit-learn/issues/13448#issuecomment-47834...). Roman On 03/04/2019 12:06, Trevor Stephens wrote:
I think #1 if any of these... Plotting functions should hopefully be as general as possible, so tagging with a specific type of estimator will, in some scikit-learn utopia, be unnecessary.
If a general plotter is built, where does it live in other estimator-specific namespace options? Feels awkward to put it under every estimator's namespace.
Then again, there might be a #4 where there is no plot module and plotting classes live under groups of utilities like introspection, cross-validation or something?...
On Wed, Apr 3, 2019 at 8:54 PM Andrew Howe <ahowe42@gmail.com <mailto:ahowe42@gmail.com>> wrote:
My preference would be for (1). I don't think the sub-namespace in (2) is necessary, and don't like (3), as I would prefer the plotting functions to be all in the same namespace sklearn.plot.
Andrew
<~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile <http://www.linkedin.com/in/ahowe42> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/> Open Researcher and Contributor ID (ORCID) <http://orcid.org/0000-0002-3553-1990> Github Profile <http://github.com/ahowe42> Personal Website <http://www.andrewhowe.com> I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
On Tue, Apr 2, 2019 at 3:40 PM Hanmin Qin <qinhanmin2005@sina.com <mailto:qinhanmin2005@sina.com>> wrote:
See https://github.com/scikit-learn/scikit-learn/issues/13448
We've introduced several plotting functions (e.g., plot_tree and plot_partial_dependence) and will introduce more (e.g., plot_decision_boundary) in the future. Consequently, we need to decide where to put these functions. Currently, there're 3 proposals:
(1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree)
(2) sklearn.plot.XXX.plot_YYY (e.g., sklearn.plot.tree.plot_tree)
(3) sklearn.XXX.plot.plot_YYY (e.g., sklearn.tree.plot.plot_tree, note that we won't support from sklearn.XXX import plot_YYY)
Joel Nothman, Gael Varoquaux and I decided to post it on the mailing list to invite opinions.
Thanks
Hanmin Qin _______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
I think what was not clear from the question is that there is actually quite different kinds of plotting functions, and many of these are tied to existing code. Right now we have some that are specific to trees (plot_tree) and to gradient boosting (plot_partial_dependence). I think we want more general functions, and plot_partial_dependence has been extended to general estimators. However, the plotting functions might be generic wrt the estimator, but relate to a specific function, say plotting results of GridSearchCV. Then one might argue that having the plotting function close to GridSearchCV might make sense. Similarly for plotting partial dependence plots and feature importances, it might be a bit strange to have the plotting functions not next to the functions that compute these. Another question would be is whether the plotting functions also "do the work" in some cases: Do we want plot_partial_dependence also to compute the partial dependence? (I would argue yes but either way the result is a bit strange). In that case you have somewhat of the same functionality in two different modules, unless you also put the "compute partial dependence" function in the plotting module as well, which is a bit strange. Maybe we could inform this discussion by listing candidate plotting functions, and also considering whether they "do the work" and where the "work" function is. Other examples are plotting the confusion matrix, which probably should also compute the confusion matrix (it's fast and so that would be convenient), and so it would "duplicate" functionality from the metrics module. Plotting learning curves and validation curves should probably not do the work as it's pretty involved, and so someone would need to import the learning and validation curves from model selection, and then the plotting functions from a plotting module. Calibrations curves and P/R curves and roc curves are also pretty fast to compute (and passing around the arguments is somewhat error prone) so I would say the plotting functions for these should do the work as well. Anyway, you can see that many plotting functions are actually associated with functions in existing modules and the interactions are a bit unclear. The only plotting functions I haven't mentioned so far that I thought about in the past are "2d scatter" and "plot decision function". These would be kind of generic, but mostly used in the examples. Though having a discrete 2d scatter function would be pretty nice (plt.scatter doesn't allow legends and makes it hard to use qualitative color maps). I think I would vote for option (1), "sklearn.plot.plot_zzz" but the case is not really that clear. Cheers, Andy On 4/3/19 7:35 AM, Roman Yurchak via scikit-learn wrote:
+1 for options 1 and +0.5 for 3. Do we anticipate that many plotting functions will be added? If it's just a dozen or less, putting them all into a single namespace sklearn.plot might be easier.
This also would avoid discussion about where to put some generic plotting functions (e.g. https://github.com/scikit-learn/scikit-learn/issues/13448#issuecomment-47834...).
Roman
On 03/04/2019 12:06, Trevor Stephens wrote:
I think #1 if any of these... Plotting functions should hopefully be as general as possible, so tagging with a specific type of estimator will, in some scikit-learn utopia, be unnecessary.
If a general plotter is built, where does it live in other estimator-specific namespace options? Feels awkward to put it under every estimator's namespace.
Then again, there might be a #4 where there is no plot module and plotting classes live under groups of utilities like introspection, cross-validation or something?...
On Wed, Apr 3, 2019 at 8:54 PM Andrew Howe <ahowe42@gmail.com <mailto:ahowe42@gmail.com>> wrote:
My preference would be for (1). I don't think the sub-namespace in (2) is necessary, and don't like (3), as I would prefer the plotting functions to be all in the same namespace sklearn.plot.
Andrew
<~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile <http://www.linkedin.com/in/ahowe42> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/> Open Researcher and Contributor ID (ORCID) <http://orcid.org/0000-0002-3553-1990> Github Profile <http://github.com/ahowe42> Personal Website <http://www.andrewhowe.com> I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
On Tue, Apr 2, 2019 at 3:40 PM Hanmin Qin <qinhanmin2005@sina.com <mailto:qinhanmin2005@sina.com>> wrote:
See https://github.com/scikit-learn/scikit-learn/issues/13448
We've introduced several plotting functions (e.g., plot_tree and plot_partial_dependence) and will introduce more (e.g., plot_decision_boundary) in the future. Consequently, we need to decide where to put these functions. Currently, there're 3 proposals:
(1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree)
(2) sklearn.plot.XXX.plot_YYY (e.g., sklearn.plot.tree.plot_tree)
(3) sklearn.XXX.plot.plot_YYY (e.g., sklearn.tree.plot.plot_tree, note that we won't support from sklearn.XXX import plot_YYY)
Joel Nothman, Gael Varoquaux and I decided to post it on the mailing list to invite opinions.
Thanks
Hanmin Qin _______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
With option 1, sklearn.plot is likely to import large chunks of the library (particularly, but not exclusively, if the plotting function "does the work" as Andy suggests). This is under the assumption that one plot function will want to import trees, another GPs, etc. Unless we move to lazy imports, that would be against the current convention that importing sklearn is fairly minimal. I do like Andy's idea of framing this discussion more clearly around likely candidates. On Thu, 4 Apr 2019 at 00:10, Andreas Mueller <t3kcit@gmail.com> wrote:
I think what was not clear from the question is that there is actually quite different kinds of plotting functions, and many of these are tied to existing code.
Right now we have some that are specific to trees (plot_tree) and to gradient boosting (plot_partial_dependence).
I think we want more general functions, and plot_partial_dependence has been extended to general estimators.
However, the plotting functions might be generic wrt the estimator, but relate to a specific function, say plotting results of GridSearchCV. Then one might argue that having the plotting function close to GridSearchCV might make sense. Similarly for plotting partial dependence plots and feature importances, it might be a bit strange to have the plotting functions not next to the functions that compute these. Another question would be is whether the plotting functions also "do the work" in some cases: Do we want plot_partial_dependence also to compute the partial dependence? (I would argue yes but either way the result is a bit strange). In that case you have somewhat of the same functionality in two different modules, unless you also put the "compute partial dependence" function in the plotting module as well, which is a bit strange.
Maybe we could inform this discussion by listing candidate plotting functions, and also considering whether they "do the work" and where the "work" function is.
Other examples are plotting the confusion matrix, which probably should also compute the confusion matrix (it's fast and so that would be convenient), and so it would "duplicate" functionality from the metrics module.
Plotting learning curves and validation curves should probably not do the work as it's pretty involved, and so someone would need to import the learning and validation curves from model selection, and then the plotting functions from a plotting module.
Calibrations curves and P/R curves and roc curves are also pretty fast to compute (and passing around the arguments is somewhat error prone) so I would say the plotting functions for these should do the work as well.
Anyway, you can see that many plotting functions are actually associated with functions in existing modules and the interactions are a bit unclear.
The only plotting functions I haven't mentioned so far that I thought about in the past are "2d scatter" and "plot decision function". These would be kind of generic, but mostly used in the examples. Though having a discrete 2d scatter function would be pretty nice (plt.scatter doesn't allow legends and makes it hard to use qualitative color maps).
I think I would vote for option (1), "sklearn.plot.plot_zzz" but the case is not really that clear.
Cheers,
Andy
On 4/3/19 7:35 AM, Roman Yurchak via scikit-learn wrote:
+1 for options 1 and +0.5 for 3. Do we anticipate that many plotting functions will be added? If it's just a dozen or less, putting them all into a single namespace sklearn.plot might be easier.
This also would avoid discussion about where to put some generic plotting functions (e.g. https://github.com/scikit-learn/scikit-learn/issues/13448#issuecomment-47834...).
Roman
On 03/04/2019 12:06, Trevor Stephens wrote:
I think #1 if any of these... Plotting functions should hopefully be as general as possible, so tagging with a specific type of estimator will, in some scikit-learn utopia, be unnecessary.
If a general plotter is built, where does it live in other estimator-specific namespace options? Feels awkward to put it under every estimator's namespace.
Then again, there might be a #4 where there is no plot module and plotting classes live under groups of utilities like introspection, cross-validation or something?...
On Wed, Apr 3, 2019 at 8:54 PM Andrew Howe <ahowe42@gmail.com <mailto:ahowe42@gmail.com>> wrote:
My preference would be for (1). I don't think the sub-namespace in (2) is necessary, and don't like (3), as I would prefer the plotting functions to be all in the same namespace sklearn.plot.
Andrew
<~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile <http://www.linkedin.com/in/ahowe42> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/> Open Researcher and Contributor ID (ORCID) <http://orcid.org/0000-0002-3553-1990> Github Profile <http://github.com/ahowe42> Personal Website <http://www.andrewhowe.com> I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
On Tue, Apr 2, 2019 at 3:40 PM Hanmin Qin <qinhanmin2005@sina.com <mailto:qinhanmin2005@sina.com>> wrote:
See https://github.com/scikit-learn/scikit-learn/issues/13448
We've introduced several plotting functions (e.g., plot_tree and plot_partial_dependence) and will introduce more (e.g., plot_decision_boundary) in the future. Consequently, we need to decide where to put these functions. Currently, there're 3 proposals:
(1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree)
(2) sklearn.plot.XXX.plot_YYY (e.g., sklearn.plot.tree.plot_tree)
(3) sklearn.XXX.plot.plot_YYY (e.g., sklearn.tree.plot.plot_tree, note that we won't support from sklearn.XXX import plot_YYY)
Joel Nothman, Gael Varoquaux and I decided to post it on the mailing list to invite opinions.
Thanks
Hanmin Qin _______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
This is not a strongly-held suggestion - but what about adopting YellowBrick as the plotting API for sklearn? Not sure how exactly the interaction would work - could be PRs to their library, or ask them to integrate into sklearn, or do a lock-step dance with versions but maintain separate teams? (I know it raises more questions than answers, but wanted to put it out there.) On Wed, Apr 3, 2019 at 4:07 PM Joel Nothman <joel.nothman@gmail.com> wrote:
With option 1, sklearn.plot is likely to import large chunks of the library (particularly, but not exclusively, if the plotting function "does the work" as Andy suggests). This is under the assumption that one plot function will want to import trees, another GPs, etc. Unless we move to lazy imports, that would be against the current convention that importing sklearn is fairly minimal.
I do like Andy's idea of framing this discussion more clearly around likely candidates.
On Thu, 4 Apr 2019 at 00:10, Andreas Mueller <t3kcit@gmail.com> wrote:
I think what was not clear from the question is that there is actually quite different kinds of plotting functions, and many of these are tied to existing code.
Right now we have some that are specific to trees (plot_tree) and to gradient boosting (plot_partial_dependence).
I think we want more general functions, and plot_partial_dependence has been extended to general estimators.
However, the plotting functions might be generic wrt the estimator, but relate to a specific function, say plotting results of GridSearchCV. Then one might argue that having the plotting function close to GridSearchCV might make sense. Similarly for plotting partial dependence plots and feature importances, it might be a bit strange to have the plotting functions not next to the functions that compute these. Another question would be is whether the plotting functions also "do the work" in some cases: Do we want plot_partial_dependence also to compute the partial dependence? (I would argue yes but either way the result is a bit
In that case you have somewhat of the same functionality in two different modules, unless you also put the "compute partial dependence" function in the plotting module as well, which is a bit strange.
Maybe we could inform this discussion by listing candidate plotting functions, and also considering whether they "do the work" and where the "work" function is.
Other examples are plotting the confusion matrix, which probably should also compute the confusion matrix (it's fast and so that would be convenient), and so it would "duplicate" functionality from the metrics module.
Plotting learning curves and validation curves should probably not do the work as it's pretty involved, and so someone would need to import the learning and validation curves from model selection, and then the plotting functions from a plotting module.
Calibrations curves and P/R curves and roc curves are also pretty fast to compute (and passing around the arguments is somewhat error prone) so I would say the plotting functions for these should do the work as well.
Anyway, you can see that many plotting functions are actually associated with functions in existing modules and the interactions are a bit unclear.
The only plotting functions I haven't mentioned so far that I thought about in the past are "2d scatter" and "plot decision function". These would be kind of generic, but mostly used in the examples. Though having a discrete 2d scatter function would be pretty nice (plt.scatter doesn't allow legends and makes it hard to use qualitative color maps).
I think I would vote for option (1), "sklearn.plot.plot_zzz" but the case is not really that clear.
Cheers,
Andy
On 4/3/19 7:35 AM, Roman Yurchak via scikit-learn wrote:
+1 for options 1 and +0.5 for 3. Do we anticipate that many plotting functions will be added? If it's just a dozen or less, putting them all into a single namespace sklearn.plot might be easier.
This also would avoid discussion about where to put some generic plotting functions (e.g.
https://github.com/scikit-learn/scikit-learn/issues/13448#issuecomment-47834... ).
Roman
On 03/04/2019 12:06, Trevor Stephens wrote:
I think #1 if any of these... Plotting functions should hopefully be
as
general as possible, so tagging with a specific type of estimator will, in some scikit-learn utopia, be unnecessary.
If a general plotter is built, where does it live in other estimator-specific namespace options? Feels awkward to put it under every estimator's namespace.
Then again, there might be a #4 where there is no plot module and plotting classes live under groups of utilities like introspection, cross-validation or something?...
On Wed, Apr 3, 2019 at 8:54 PM Andrew Howe <ahowe42@gmail.com <mailto:ahowe42@gmail.com>> wrote:
My preference would be for (1). I don't think the sub-namespace in (2) is necessary, and don't like (3), as I would prefer the
strange). plotting
functions to be all in the same namespace sklearn.plot.
Andrew
<~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile <http://www.linkedin.com/in/ahowe42> ResearchGate Profile <
http://www.researchgate.net/profile/John_Howe12/>
Open Researcher and Contributor ID (ORCID) <http://orcid.org/0000-0002-3553-1990> Github Profile <http://github.com/ahowe42> Personal Website <http://www.andrewhowe.com> I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
On Tue, Apr 2, 2019 at 3:40 PM Hanmin Qin <
qinhanmin2005@sina.com
<mailto:qinhanmin2005@sina.com>> wrote:
See
https://github.com/scikit-learn/scikit-learn/issues/13448
We've introduced several plotting functions (e.g., plot_tree
and
plot_partial_dependence) and will introduce more (e.g., plot_decision_boundary) in the future. Consequently, we need
to
decide where to put these functions. Currently, there're 3 proposals:
(1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree)
(2) sklearn.plot.XXX.plot_YYY (e.g.,
sklearn.plot.tree.plot_tree)
(3) sklearn.XXX.plot.plot_YYY (e.g., sklearn.tree.plot.plot_tree, note that we won't support from sklearn.XXX import plot_YYY)
Joel Nothman, Gael Varoquaux and I decided to post it on the mailing list to invite opinions.
Thanks
Hanmin Qin _______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
I also think that YellowBrick folks did a great job and that we should not reinvent the wheel or at least have clear idea of how we differ in scope with respect to YellowBrick my 2c Alex On Thu, Apr 4, 2019 at 1:02 AM Eric Ma <ericmajinglong@gmail.com> wrote:
This is not a strongly-held suggestion - but what about adopting YellowBrick as the plotting API for sklearn? Not sure how exactly the interaction would work - could be PRs to their library, or ask them to integrate into sklearn, or do a lock-step dance with versions but maintain separate teams? (I know it raises more questions than answers, but wanted to put it out there.)
On Wed, Apr 3, 2019 at 4:07 PM Joel Nothman <joel.nothman@gmail.com> wrote:
With option 1, sklearn.plot is likely to import large chunks of the library (particularly, but not exclusively, if the plotting function "does the work" as Andy suggests). This is under the assumption that one plot function will want to import trees, another GPs, etc. Unless we move to lazy imports, that would be against the current convention that importing sklearn is fairly minimal.
I do like Andy's idea of framing this discussion more clearly around likely candidates.
On Thu, 4 Apr 2019 at 00:10, Andreas Mueller <t3kcit@gmail.com> wrote:
I think what was not clear from the question is that there is actually quite different kinds of plotting functions, and many of these are tied to existing code.
Right now we have some that are specific to trees (plot_tree) and to gradient boosting (plot_partial_dependence).
I think we want more general functions, and plot_partial_dependence has been extended to general estimators.
However, the plotting functions might be generic wrt the estimator, but relate to a specific function, say plotting results of GridSearchCV. Then one might argue that having the plotting function close to GridSearchCV might make sense. Similarly for plotting partial dependence plots and feature importances, it might be a bit strange to have the plotting functions not next to the functions that compute these. Another question would be is whether the plotting functions also "do the work" in some cases: Do we want plot_partial_dependence also to compute the partial dependence? (I would argue yes but either way the result is a bit
In that case you have somewhat of the same functionality in two different modules, unless you also put the "compute partial dependence" function in the plotting module as well, which is a bit strange.
Maybe we could inform this discussion by listing candidate plotting functions, and also considering whether they "do the work" and where the "work" function is.
Other examples are plotting the confusion matrix, which probably should also compute the confusion matrix (it's fast and so that would be convenient), and so it would "duplicate" functionality from the metrics module.
Plotting learning curves and validation curves should probably not do the work as it's pretty involved, and so someone would need to import the learning and validation curves from model selection, and then the plotting functions from a plotting module.
Calibrations curves and P/R curves and roc curves are also pretty fast to compute (and passing around the arguments is somewhat error prone) so I would say the plotting functions for these should do the work as well.
Anyway, you can see that many plotting functions are actually associated with functions in existing modules and the interactions are a bit unclear.
The only plotting functions I haven't mentioned so far that I thought about in the past are "2d scatter" and "plot decision function". These would be kind of generic, but mostly used in the examples. Though having a discrete 2d scatter function would be pretty nice (plt.scatter doesn't allow legends and makes it hard to use qualitative color maps).
I think I would vote for option (1), "sklearn.plot.plot_zzz" but the case is not really that clear.
Cheers,
Andy
On 4/3/19 7:35 AM, Roman Yurchak via scikit-learn wrote:
+1 for options 1 and +0.5 for 3. Do we anticipate that many plotting functions will be added? If it's just a dozen or less, putting them all into a single namespace sklearn.plot might be easier.
This also would avoid discussion about where to put some generic plotting functions (e.g.
https://github.com/scikit-learn/scikit-learn/issues/13448#issuecomment-47834... ).
Roman
On 03/04/2019 12:06, Trevor Stephens wrote:
I think #1 if any of these... Plotting functions should hopefully be
as
general as possible, so tagging with a specific type of estimator will, in some scikit-learn utopia, be unnecessary.
If a general plotter is built, where does it live in other estimator-specific namespace options? Feels awkward to put it under every estimator's namespace.
Then again, there might be a #4 where there is no plot module and plotting classes live under groups of utilities like introspection, cross-validation or something?...
On Wed, Apr 3, 2019 at 8:54 PM Andrew Howe <ahowe42@gmail.com <mailto:ahowe42@gmail.com>> wrote:
My preference would be for (1). I don't think the sub-namespace in (2) is necessary, and don't like (3), as I would prefer the
functions to be all in the same namespace sklearn.plot.
Andrew
<~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile <http://www.linkedin.com/in/ahowe42> ResearchGate Profile <
http://www.researchgate.net/profile/John_Howe12/>
Open Researcher and Contributor ID (ORCID) <http://orcid.org/0000-0002-3553-1990> Github Profile <http://github.com/ahowe42> Personal Website <http://www.andrewhowe.com> I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
On Tue, Apr 2, 2019 at 3:40 PM Hanmin Qin <
qinhanmin2005@sina.com
<mailto:qinhanmin2005@sina.com>> wrote:
See
https://github.com/scikit-learn/scikit-learn/issues/13448
We've introduced several plotting functions (e.g.,
strange). plotting plot_tree and
plot_partial_dependence) and will introduce more (e.g., plot_decision_boundary) in the future. Consequently, we
need to
decide where to put these functions. Currently, there're 3 proposals:
(1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree)
(2) sklearn.plot.XXX.plot_YYY (e.g.,
sklearn.plot.tree.plot_tree)
(3) sklearn.XXX.plot.plot_YYY (e.g., sklearn.tree.plot.plot_tree, note that we won't support from sklearn.XXX import plot_YYY)
Joel Nothman, Gael Varoquaux and I decided to post it on the mailing list to invite opinions.
Thanks
Hanmin Qin _______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
I would argue that sklearn users would benefit in having solutions in scikit-learn. The yellowbrick api is quite different from the approaches we discussed. If we can reuse their implementations I think we should do so and credit where we can. Having plotting in sklearn is also likely to attract more contributors and we have more eyes for doing reviews. Sent from phone. Please excuse spelling and brevity. On Thu, Apr 4, 2019, 05:43 Alexandre Gramfort <alexandre.gramfort@inria.fr> wrote:
I also think that YellowBrick folks did a great job and that we should not reinvent the wheel or at least have clear idea of how we differ in scope with respect to YellowBrick
my 2c
Alex
On Thu, Apr 4, 2019 at 1:02 AM Eric Ma <ericmajinglong@gmail.com> wrote:
This is not a strongly-held suggestion - but what about adopting YellowBrick as the plotting API for sklearn? Not sure how exactly the interaction would work - could be PRs to their library, or ask them to integrate into sklearn, or do a lock-step dance with versions but maintain separate teams? (I know it raises more questions than answers, but wanted to put it out there.)
On Wed, Apr 3, 2019 at 4:07 PM Joel Nothman <joel.nothman@gmail.com> wrote:
With option 1, sklearn.plot is likely to import large chunks of the library (particularly, but not exclusively, if the plotting function "does the work" as Andy suggests). This is under the assumption that one plot function will want to import trees, another GPs, etc. Unless we move to lazy imports, that would be against the current convention that importing sklearn is fairly minimal.
I do like Andy's idea of framing this discussion more clearly around likely candidates.
On Thu, 4 Apr 2019 at 00:10, Andreas Mueller <t3kcit@gmail.com> wrote:
I think what was not clear from the question is that there is actually quite different kinds of plotting functions, and many of these are tied to existing code.
Right now we have some that are specific to trees (plot_tree) and to gradient boosting (plot_partial_dependence).
I think we want more general functions, and plot_partial_dependence has been extended to general estimators.
However, the plotting functions might be generic wrt the estimator, but relate to a specific function, say plotting results of GridSearchCV. Then one might argue that having the plotting function close to GridSearchCV might make sense. Similarly for plotting partial dependence plots and feature
it might be a bit strange to have the plotting functions not next to
functions that compute these. Another question would be is whether the plotting functions also "do
work" in some cases: Do we want plot_partial_dependence also to compute the partial dependence? (I would argue yes but either way the result is a bit strange). In that case you have somewhat of the same functionality in two different modules, unless you also put the "compute partial dependence" function in the plotting module as well, which is a bit strange.
Maybe we could inform this discussion by listing candidate plotting functions, and also considering whether they "do the work" and where
"work" function is.
Other examples are plotting the confusion matrix, which probably should also compute the confusion matrix (it's fast and so that would be convenient), and so it would "duplicate" functionality from the metrics module.
Plotting learning curves and validation curves should probably not do the work as it's pretty involved, and so someone would need to import the learning and validation curves from model selection, and then the plotting functions from a plotting module.
Calibrations curves and P/R curves and roc curves are also pretty fast to compute (and passing around the arguments is somewhat error prone) so I would say the plotting functions for these should do the work as well.
Anyway, you can see that many plotting functions are actually associated with functions in existing modules and the interactions are a bit unclear.
The only plotting functions I haven't mentioned so far that I thought about in the past are "2d scatter" and "plot decision function". These would be kind of generic, but mostly used in the examples. Though having a discrete 2d scatter function would be pretty nice (plt.scatter doesn't allow legends and makes it hard to use qualitative color maps).
I think I would vote for option (1), "sklearn.plot.plot_zzz" but the case is not really that clear.
Cheers,
Andy
On 4/3/19 7:35 AM, Roman Yurchak via scikit-learn wrote:
+1 for options 1 and +0.5 for 3. Do we anticipate that many plotting functions will be added? If it's just a dozen or less, putting them all into a single namespace sklearn.plot might be easier.
This also would avoid discussion about where to put some generic plotting functions (e.g.
https://github.com/scikit-learn/scikit-learn/issues/13448#issuecomment-47834... ).
Roman
On 03/04/2019 12:06, Trevor Stephens wrote:
I think #1 if any of these... Plotting functions should hopefully
be as
general as possible, so tagging with a specific type of estimator will, in some scikit-learn utopia, be unnecessary.
If a general plotter is built, where does it live in other estimator-specific namespace options? Feels awkward to put it under every estimator's namespace.
Then again, there might be a #4 where there is no plot module and plotting classes live under groups of utilities like introspection, cross-validation or something?...
On Wed, Apr 3, 2019 at 8:54 PM Andrew Howe <ahowe42@gmail.com <mailto:ahowe42@gmail.com>> wrote:
My preference would be for (1). I don't think the sub-namespace in (2) is necessary, and don't like (3), as I would prefer the
functions to be all in the same namespace sklearn.plot.
Andrew
<~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile <http://www.linkedin.com/in/ahowe42> ResearchGate Profile <
http://www.researchgate.net/profile/John_Howe12/>
Open Researcher and Contributor ID (ORCID) <http://orcid.org/0000-0002-3553-1990> Github Profile <http://github.com/ahowe42> Personal Website <http://www.andrewhowe.com> I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
On Tue, Apr 2, 2019 at 3:40 PM Hanmin Qin <
qinhanmin2005@sina.com
<mailto:qinhanmin2005@sina.com>> wrote:
See
https://github.com/scikit-learn/scikit-learn/issues/13448
We've introduced several plotting functions (e.g.,
plot_partial_dependence) and will introduce more (e.g., plot_decision_boundary) in the future. Consequently, we
need to
decide where to put these functions. Currently, there're 3 proposals:
(1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree)
(2) sklearn.plot.XXX.plot_YYY (e.g.,
sklearn.plot.tree.plot_tree)
(3) sklearn.XXX.plot.plot_YYY (e.g., sklearn.tree.plot.plot_tree, note that we won't support
from
sklearn.XXX import plot_YYY)
Joel Nothman, Gael Varoquaux and I decided to post it on
importances, the the the plotting plot_tree and the
mailing list to invite opinions.
Thanks
Hanmin Qin _______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Well it would certainly be a low-cost effort improvement if we demonstrated yellowbrick in our examples.
I'm with Andreas on this. As a user, I would prefer to see this as part of sklearn with the usual sklearn api. If we want static matplotlib-style images, reusing (with credit) some of the yellowbrick implementations is a good idea. Would we consider plotly-based visualizations? I've been doing my own plotting in plotly for the last month, and can't imagine going back to static matplotlib plots... Andrew <~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile <http://www.linkedin.com/in/ahowe42> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/> Open Researcher and Contributor ID (ORCID) <http://orcid.org/0000-0002-3553-1990> Github Profile <http://github.com/ahowe42> Personal Website <http://www.andrewhowe.com> I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~> On Thu, Apr 4, 2019 at 3:26 PM Andreas Mueller <t3kcit@gmail.com> wrote:
I would argue that sklearn users would benefit in having solutions in scikit-learn. The yellowbrick api is quite different from the approaches we discussed. If we can reuse their implementations I think we should do so and credit where we can. Having plotting in sklearn is also likely to attract more contributors and we have more eyes for doing reviews.
Sent from phone. Please excuse spelling and brevity.
On Thu, Apr 4, 2019, 05:43 Alexandre Gramfort <alexandre.gramfort@inria.fr> wrote:
I also think that YellowBrick folks did a great job and that we should not reinvent the wheel or at least have clear idea of how we differ in scope with respect to YellowBrick
my 2c
Alex
On Thu, Apr 4, 2019 at 1:02 AM Eric Ma <ericmajinglong@gmail.com> wrote:
This is not a strongly-held suggestion - but what about adopting YellowBrick as the plotting API for sklearn? Not sure how exactly the interaction would work - could be PRs to their library, or ask them to integrate into sklearn, or do a lock-step dance with versions but maintain separate teams? (I know it raises more questions than answers, but wanted to put it out there.)
On Wed, Apr 3, 2019 at 4:07 PM Joel Nothman <joel.nothman@gmail.com> wrote:
With option 1, sklearn.plot is likely to import large chunks of the library (particularly, but not exclusively, if the plotting function "does the work" as Andy suggests). This is under the assumption that one plot function will want to import trees, another GPs, etc. Unless we move to lazy imports, that would be against the current convention that importing sklearn is fairly minimal.
I do like Andy's idea of framing this discussion more clearly around likely candidates.
On Thu, 4 Apr 2019 at 00:10, Andreas Mueller <t3kcit@gmail.com> wrote:
I think what was not clear from the question is that there is actually quite different kinds of plotting functions, and many of these are
to existing code.
Right now we have some that are specific to trees (plot_tree) and to gradient boosting (plot_partial_dependence).
I think we want more general functions, and plot_partial_dependence has been extended to general estimators.
However, the plotting functions might be generic wrt the estimator, but relate to a specific function, say plotting results of GridSearchCV. Then one might argue that having the plotting function close to GridSearchCV might make sense. Similarly for plotting partial dependence plots and feature importances, it might be a bit strange to have the plotting functions not next to
functions that compute these. Another question would be is whether the plotting functions also "do
work" in some cases: Do we want plot_partial_dependence also to compute the partial dependence? (I would argue yes but either way the result is a bit strange). In that case you have somewhat of the same functionality in two different modules, unless you also put the "compute partial dependence" function in the plotting module as well, which is a bit strange.
Maybe we could inform this discussion by listing candidate plotting functions, and also considering whether they "do the work" and where
"work" function is.
Other examples are plotting the confusion matrix, which probably should also compute the confusion matrix (it's fast and so that would be convenient), and so it would "duplicate" functionality from the metrics module.
Plotting learning curves and validation curves should probably not do the work as it's pretty involved, and so someone would need to import the learning and validation curves from model selection, and then the plotting functions from a plotting module.
Calibrations curves and P/R curves and roc curves are also pretty fast to compute (and passing around the arguments is somewhat error prone) so I would say the plotting functions for these should do the work as well.
Anyway, you can see that many plotting functions are actually associated with functions in existing modules and the interactions are a bit unclear.
The only plotting functions I haven't mentioned so far that I thought about in the past are "2d scatter" and "plot decision function". These would be kind of generic, but mostly used in the examples. Though having a discrete 2d scatter function would be pretty nice (plt.scatter doesn't allow legends and makes it hard to use qualitative color maps).
I think I would vote for option (1), "sklearn.plot.plot_zzz" but the case is not really that clear.
Cheers,
Andy
On 4/3/19 7:35 AM, Roman Yurchak via scikit-learn wrote:
+1 for options 1 and +0.5 for 3. Do we anticipate that many plotting functions will be added? If it's just a dozen or less, putting them all into a single namespace sklearn.plot might be easier.
This also would avoid discussion about where to put some generic plotting functions (e.g.
https://github.com/scikit-learn/scikit-learn/issues/13448#issuecomment-47834... ).
Roman
On 03/04/2019 12:06, Trevor Stephens wrote: > I think #1 if any of these... Plotting functions should hopefully
be as
> general as possible, so tagging with a specific type of estimator will, > in some scikit-learn utopia, be unnecessary. > > If a general plotter is built, where does it live in other > estimator-specific namespace options? Feels awkward to put it under > every estimator's namespace. > > Then again, there might be a #4 where there is no plot module and > plotting classes live under groups of utilities like introspection, > cross-validation or something?... > > On Wed, Apr 3, 2019 at 8:54 PM Andrew Howe <ahowe42@gmail.com > <mailto:ahowe42@gmail.com>> wrote: > > My preference would be for (1). I don't think the sub-namespace in > (2) is necessary, and don't like (3), as I would prefer the
> functions to be all in the same namespace sklearn.plot. > > Andrew > > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > J. Andrew Howe, PhD > LinkedIn Profile <http://www.linkedin.com/in/ahowe42> > ResearchGate Profile < http://www.researchgate.net/profile/John_Howe12/> > Open Researcher and Contributor ID (ORCID) > <http://orcid.org/0000-0002-3553-1990> > Github Profile <http://github.com/ahowe42> > Personal Website <http://www.andrewhowe.com> > I live to learn, so I can learn to live. - me > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > > > On Tue, Apr 2, 2019 at 3:40 PM Hanmin Qin < qinhanmin2005@sina.com > <mailto:qinhanmin2005@sina.com>> wrote: > > See https://github.com/scikit-learn/scikit-learn/issues/13448 > > We've introduced several plotting functions (e.g.,
> plot_partial_dependence) and will introduce more (e.g., > plot_decision_boundary) in the future. Consequently, we need to > decide where to put these functions. Currently, there're 3 > proposals: > > (1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree) > > (2) sklearn.plot.XXX.plot_YYY (e.g., sklearn.plot.tree.plot_tree) > > (3) sklearn.XXX.plot.plot_YYY (e.g., > sklearn.tree.plot.plot_tree, note that we won't support from > sklearn.XXX import plot_YYY) > > Joel Nothman, Gael Varoquaux and I decided to post it on
tied the the the plotting plot_tree and the
> mailing list to invite opinions. > > Thanks > > Hanmin Qin > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org <mailto:scikit-learn@python.org> > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org <mailto:scikit-learn@python.org> > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi, I suppose you won't want to rewrite all examples if you choose plotly-based viz, so this help page about converting matplotlib figures or code to plotly might help https://plot.ly/matplotlib/getting-started/ I hope it works, the doc page looks a bit old. Cheers Emma On Sun, Apr 07, 2019 at 10:08:24AM +0100, Andrew Howe wrote:
I'm with Andreas on this. As a user, I would prefer to see this as part of sklearn with the usual sklearn api. If we want static matplotlib-style images, reusing (with credit) some of the yellowbrick implementations is a good idea.
Would we consider plotly-based visualizations? I've been doing my own plotting in plotly for the last month, and can't imagine going back to static matplotlib plots...
Andrew
<~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile ResearchGate Profile Open Researcher and Contributor ID (ORCID) Github Profile Personal Website I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
On Thu, Apr 4, 2019 at 3:26 PM Andreas Mueller <t3kcit@gmail.com> wrote:
I would argue that sklearn users would benefit in having solutions in scikit-learn. The yellowbrick api is quite different from the approaches we discussed. If we can reuse their implementations I think we should do so and credit where we can. Having plotting in sklearn is also likely to attract more contributors and we have more eyes for doing reviews.
Sent from phone. Please excuse spelling and brevity.
On Thu, Apr 4, 2019, 05:43 Alexandre Gramfort <alexandre.gramfort@inria.fr> wrote:
I also think that YellowBrick folks did a great job and that we should not reinvent the wheel or at least have clear idea of how we differ in scope with respect to YellowBrick
my 2c
Alex
On Thu, Apr 4, 2019 at 1:02 AM Eric Ma <ericmajinglong@gmail.com> wrote:
This is not a strongly-held suggestion - but what about adopting YellowBrick as the plotting API for sklearn? Not sure how exactly the interaction would work - could be PRs to their library, or ask them to integrate into sklearn, or do a lock-step dance with versions but maintain separate teams? (I know it raises more questions than answers, but wanted to put it out there.)
On Wed, Apr 3, 2019 at 4:07 PM Joel Nothman <joel.nothman@gmail.com > wrote:
With option 1, sklearn.plot is likely to import large chunks of the library (particularly, but not exclusively, if the plotting function "does the work" as Andy suggests). This is under the assumption that one plot function will want to import trees, another GPs, etc. Unless we move to lazy imports, that would be against the current convention that importing sklearn is fairly minimal.
I do like Andy's idea of framing this discussion more clearly around likely candidates.
On Thu, 4 Apr 2019 at 00:10, Andreas Mueller <t3kcit@gmail.com> wrote:
> I think what was not clear from the question is that there is actually > quite different kinds of plotting functions, and many of these are tied > to existing code.
> Right now we have some that are specific to trees (plot_tree) and to > gradient boosting (plot_partial_dependence).
> I think we want more general functions, and plot_partial_dependence has > been extended to general estimators.
> However, the plotting functions might be generic wrt the estimator, but > relate to a specific function, say plotting results of GridSearchCV. > Then one might argue that having the plotting function close to > GridSearchCV might make sense. > Similarly for plotting partial dependence plots and feature importances, > it might be a bit strange to have the plotting functions not next to the > functions that compute these. > Another question would be is whether the plotting functions also "do the > work" in some cases: > Do we want plot_partial_dependence also to compute the partial > dependence? (I would argue yes but either way the result is a bit strange). > In that case you have somewhat of the same functionality in two > different modules, unless you also put the "compute partial dependence" > function in the plotting module as well, > which is a bit strange.
> Maybe we could inform this discussion by listing candidate plotting > functions, and also considering whether they "do the work" and where the > "work" function is.
> Other examples are plotting the confusion matrix, which probably should > also compute the confusion matrix (it's fast and so that would be > convenient), and so it would "duplicate" functionality from the metrics > module.
> Plotting learning curves and validation curves should probably not do > the work as it's pretty involved, and so someone would need to import > the learning and validation curves from model selection, and then the > plotting functions from a plotting module.
> Calibrations curves and P/R curves and roc curves are also pretty fast > to compute (and passing around the arguments is somewhat error prone) so > I would say the plotting functions for these should do the work as well.
> Anyway, you can see that many plotting functions are actually associated > with functions in existing modules and the interactions are a bit unclear.
> The only plotting functions I haven't mentioned so far that I thought > about in the past are "2d scatter" and "plot decision function". These > would be kind of generic, but mostly used in the examples. > Though having a discrete 2d scatter function would be pretty nice > (plt.scatter doesn't allow legends and makes it hard to use qualitative > color maps).
> I think I would vote for option (1), "sklearn.plot.plot_zzz" but the > case is not really that clear.
> Cheers,
> Andy
> On 4/3/19 7:35 AM, Roman Yurchak via scikit-learn wrote: > > +1 for options 1 and +0.5 for 3. Do we anticipate that many plotting > > functions will be added? If it's just a dozen or less, putting them all > > into a single namespace sklearn.plot might be easier.
> > This also would avoid discussion about where to put some generic > > plotting functions (e.g. > > https://github.com/scikit-learn/scikit-learn/issues/13448# issuecomment-478341479).
> > Roman
> > On 03/04/2019 12:06, Trevor Stephens wrote: > >> I think #1 if any of these... Plotting functions should hopefully be as > >> general as possible, so tagging with a specific type of estimator will, > >> in some scikit-learn utopia, be unnecessary.
> >> If a general plotter is built, where does it live in other > >> estimator-specific namespace options? Feels awkward to put it under > >> every estimator's namespace.
> >> Then again, there might be a #4 where there is no plot module and > >> plotting classes live under groups of utilities like introspection, > >> cross-validation or something?...
> >> On Wed, Apr 3, 2019 at 8:54 PM Andrew Howe < ahowe42@gmail.com > >> <mailto:ahowe42@gmail.com>> wrote:
> >> My preference would be for (1). I don't think the sub-namespace in > >> (2) is necessary, and don't like (3), as I would prefer the plotting > >> functions to be all in the same namespace sklearn.plot.
> >> Andrew
> >> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > >> J. Andrew Howe, PhD > >> LinkedIn Profile <http://www.linkedin.com/in/ahowe42> > >> ResearchGate Profile <http://www.researchgate.net/ profile/John_Howe12/> > >> Open Researcher and Contributor ID (ORCID) > >> <http://orcid.org/0000-0002-3553-1990> > >> Github Profile <http://github.com/ahowe42> > >> Personal Website <http://www.andrewhowe.com> > >> I live to learn, so I can learn to live. - me > >> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
> >> On Tue, Apr 2, 2019 at 3:40 PM Hanmin Qin < qinhanmin2005@sina.com > >> <mailto:qinhanmin2005@sina.com>> wrote:
> >> See https://github.com/scikit-learn/scikit-learn/ issues/13448
> >> We've introduced several plotting functions (e.g., plot_tree and > >> plot_partial_dependence) and will introduce more (e.g., > >> plot_decision_boundary) in the future. Consequently, we need to > >> decide where to put these functions. Currently, there're 3 > >> proposals:
> >> (1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree)
> >> (2) sklearn.plot.XXX.plot_YYY (e.g., sklearn.plot.tree.plot_tree)
> >> (3) sklearn.XXX.plot.plot_YYY (e.g., > >> sklearn.tree.plot.plot_tree, note that we won't support from > >> sklearn.XXX import plot_YYY)
> >> Joel Nothman, Gael Varoquaux and I decided to post it on the > >> mailing list to invite opinions.
> >> Thanks
> >> Hanmin Qin > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn@python.org <mailto: scikit-learn@python.org> > >> https://mail.python.org/mailman/listinfo/ scikit-learn
> >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn@python.org <mailto: scikit-learn@python.org> > >> https://mail.python.org/mailman/listinfo/scikit-learn
> > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (10)
-
Alexandre Gramfort -
Andreas Mueller -
Andrew Howe -
Brown J.B. -
Emmanuelle Gouillart -
Eric Ma -
Hanmin Qin -
Joel Nothman -
Roman Yurchak -
Trevor Stephens