[scikit-learn] API Discussion: Where shall we put the plotting functions?

Eric Ma ericmajinglong at gmail.com
Wed Apr 3 18:59:02 EDT 2019


This is not a strongly-held suggestion - but what about adopting
YellowBrick as the plotting API for sklearn? Not sure how exactly the
interaction would work - could be PRs to their library, or ask them to
integrate into sklearn, or do a lock-step dance with versions but maintain
separate teams? (I know it raises more questions than answers, but wanted
to put it out there.)

On Wed, Apr 3, 2019 at 4:07 PM Joel Nothman <joel.nothman at gmail.com> wrote:

> With option 1, sklearn.plot is likely to import large chunks of the
> library (particularly, but not exclusively, if the plotting function
> "does the work" as Andy suggests). This is under the assumption that
> one plot function will want to import trees, another GPs, etc. Unless
> we move to lazy imports, that would be against the current convention
> that importing sklearn is fairly minimal.
>
> I do like Andy's idea of framing this discussion more clearly around
> likely candidates.
>
> On Thu, 4 Apr 2019 at 00:10, Andreas Mueller <t3kcit at gmail.com> wrote:
> >
> > I think what was not clear from the question is that there is actually
> > quite different kinds of plotting functions, and many of these are tied
> > to existing code.
> >
> > Right now we have some that are specific to trees (plot_tree) and to
> > gradient boosting (plot_partial_dependence).
> >
> > I think we want more general functions, and plot_partial_dependence has
> > been extended to general estimators.
> >
> > However, the plotting functions might be generic wrt the estimator, but
> > relate to a specific function, say plotting results of GridSearchCV.
> > Then one might argue that having the plotting function close to
> > GridSearchCV might make sense.
> > Similarly for plotting partial dependence plots and feature importances,
> > it might be a bit strange to have the plotting functions not next to the
> > functions that compute these.
> > Another question would be is whether the plotting functions also "do the
> > work" in some cases:
> > Do we want plot_partial_dependence also to compute the partial
> > dependence? (I would argue yes but either way the result is a bit
> strange).
> > In that case you have somewhat of the same functionality in two
> > different modules, unless you also put the "compute partial dependence"
> > function in the plotting module as well,
> > which is a bit strange.
> >
> > Maybe we could inform this discussion by listing candidate plotting
> > functions, and also considering whether they "do the work" and where the
> > "work" function is.
> >
> > Other examples are plotting the confusion matrix, which probably should
> > also compute the confusion matrix (it's fast and so that would be
> > convenient), and so it would "duplicate" functionality from the metrics
> > module.
> >
> > Plotting learning curves and validation curves should probably not do
> > the work as it's pretty involved, and so someone would need to import
> > the learning and validation curves from model selection, and then the
> > plotting functions from a plotting module.
> >
> > Calibrations curves and P/R curves and roc curves are also pretty fast
> > to compute (and passing around the arguments is somewhat error prone) so
> > I would say the plotting functions for these should do the work as well.
> >
> > Anyway, you can see that many plotting functions are actually associated
> > with functions in existing modules and the interactions are a bit
> unclear.
> >
> > The only plotting functions I haven't mentioned so far that I thought
> > about in the past are "2d scatter" and "plot decision function". These
> > would be kind of generic, but mostly used in the examples.
> > Though having a discrete 2d scatter function would be pretty nice
> > (plt.scatter doesn't allow legends and makes it hard to use qualitative
> > color maps).
> >
> >
> > I think I would vote for option (1), "sklearn.plot.plot_zzz" but the
> > case is not really that clear.
> >
> > Cheers,
> >
> > Andy
> >
> > On 4/3/19 7:35 AM, Roman Yurchak via scikit-learn wrote:
> > > +1 for options 1 and +0.5 for 3. Do we anticipate that many plotting
> > > functions will be added? If it's just a dozen or less, putting them all
> > > into a single namespace sklearn.plot might be easier.
> > >
> > > This also would avoid discussion about where to put some generic
> > > plotting functions (e.g.
> > >
> https://github.com/scikit-learn/scikit-learn/issues/13448#issuecomment-478341479
> ).
> > >
> > > Roman
> > >
> > > On 03/04/2019 12:06, Trevor Stephens wrote:
> > >> I think #1 if any of these... Plotting functions should hopefully be
> as
> > >> general as possible, so tagging with a specific type of estimator
> will,
> > >> in some scikit-learn utopia, be unnecessary.
> > >>
> > >> If a general plotter is built, where does it live in other
> > >> estimator-specific namespace options? Feels awkward to put it under
> > >> every estimator's namespace.
> > >>
> > >> Then again, there might be a #4 where there is no plot module and
> > >> plotting classes live under groups of utilities like introspection,
> > >> cross-validation or something?...
> > >>
> > >> On Wed, Apr 3, 2019 at 8:54 PM Andrew Howe <ahowe42 at gmail.com
> > >> <mailto:ahowe42 at gmail.com>> wrote:
> > >>
> > >>      My preference would be for (1). I don't think the sub-namespace
> in
> > >>      (2) is necessary, and don't like (3), as I would prefer the
> plotting
> > >>      functions to be all in the same namespace sklearn.plot.
> > >>
> > >>      Andrew
> > >>
> > >>      <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
> > >>      J. Andrew Howe, PhD
> > >>      LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
> > >>      ResearchGate Profile <
> http://www.researchgate.net/profile/John_Howe12/>
> > >>      Open Researcher and Contributor ID (ORCID)
> > >>      <http://orcid.org/0000-0002-3553-1990>
> > >>      Github Profile <http://github.com/ahowe42>
> > >>      Personal Website <http://www.andrewhowe.com>
> > >>      I live to learn, so I can learn to live. - me
> > >>      <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
> > >>
> > >>
> > >>      On Tue, Apr 2, 2019 at 3:40 PM Hanmin Qin <
> qinhanmin2005 at sina.com
> > >>      <mailto:qinhanmin2005 at sina.com>> wrote:
> > >>
> > >>          See
> https://github.com/scikit-learn/scikit-learn/issues/13448
> > >>
> > >>          We've introduced several plotting functions (e.g., plot_tree
> and
> > >>          plot_partial_dependence) and will introduce more (e.g.,
> > >>          plot_decision_boundary) in the future. Consequently, we need
> to
> > >>          decide where to put these functions. Currently, there're 3
> > >>          proposals:
> > >>
> > >>          (1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree)
> > >>
> > >>          (2) sklearn.plot.XXX.plot_YYY (e.g.,
> sklearn.plot.tree.plot_tree)
> > >>
> > >>          (3) sklearn.XXX.plot.plot_YYY (e.g.,
> > >>          sklearn.tree.plot.plot_tree, note that we won't support from
> > >>          sklearn.XXX import plot_YYY)
> > >>
> > >>          Joel Nothman, Gael Varoquaux and I decided to post it on the
> > >>          mailing list to invite opinions.
> > >>
> > >>          Thanks
> > >>
> > >>          Hanmin Qin
> > >>          _______________________________________________
> > >>          scikit-learn mailing list
> > >>          scikit-learn at python.org <mailto:scikit-learn at python.org>
> > >>          https://mail.python.org/mailman/listinfo/scikit-learn
> > >>
> > >>      _______________________________________________
> > >>      scikit-learn mailing list
> > >>      scikit-learn at python.org <mailto:scikit-learn at python.org>
> > >>      https://mail.python.org/mailman/listinfo/scikit-learn
> > >>
> > >
> > > _______________________________________________
> > > scikit-learn mailing list
> > > scikit-learn at python.org
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190404/635cc0a6/attachment-0001.html>


More information about the scikit-learn mailing list