[scikit-learn] API Discussion: Where shall we put the plotting functions?

Andrew Howe ahowe42 at gmail.com
Sun Apr 7 05:08:24 EDT 2019


I'm with Andreas on this. As a user, I would prefer to see this as part of
sklearn with the usual sklearn api. If we want static matplotlib-style
images, reusing (with credit) some of the yellowbrick implementations is a
good idea.

Would we consider plotly-based visualizations? I've been doing my own
plotting in plotly for the last month, and can't imagine going back to
static matplotlib plots...

Andrew

<~~~~~~~~~~~~~~~~~~~~~~~~~~~>
J. Andrew Howe, PhD
LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
Open Researcher and Contributor ID (ORCID)
<http://orcid.org/0000-0002-3553-1990>
Github Profile <http://github.com/ahowe42>
Personal Website <http://www.andrewhowe.com>
I live to learn, so I can learn to live. - me
<~~~~~~~~~~~~~~~~~~~~~~~~~~~>


On Thu, Apr 4, 2019 at 3:26 PM Andreas Mueller <t3kcit at gmail.com> wrote:

> I would argue that sklearn users would benefit in having solutions in
> scikit-learn. The yellowbrick api is quite different from the approaches we
> discussed. If we can reuse their implementations I think we should do so
> and credit where we can.
> Having plotting in sklearn is also likely to attract more contributors and
> we have more eyes for doing reviews.
>
> Sent from phone. Please excuse spelling and brevity.
>
> On Thu, Apr 4, 2019, 05:43 Alexandre Gramfort <alexandre.gramfort at inria.fr>
> wrote:
>
>> I also think that YellowBrick folks did a great job and that we should
>> not reinvent the wheel or at least have clear idea of how we differ in
>> scope with respect to YellowBrick
>>
>> my 2c
>>
>> Alex
>>
>>
>> On Thu, Apr 4, 2019 at 1:02 AM Eric Ma <ericmajinglong at gmail.com> wrote:
>>
>>> This is not a strongly-held suggestion - but what about adopting
>>> YellowBrick as the plotting API for sklearn? Not sure how exactly the
>>> interaction would work - could be PRs to their library, or ask them to
>>> integrate into sklearn, or do a lock-step dance with versions but maintain
>>> separate teams? (I know it raises more questions than answers, but wanted
>>> to put it out there.)
>>>
>>> On Wed, Apr 3, 2019 at 4:07 PM Joel Nothman <joel.nothman at gmail.com>
>>> wrote:
>>>
>>>> With option 1, sklearn.plot is likely to import large chunks of the
>>>> library (particularly, but not exclusively, if the plotting function
>>>> "does the work" as Andy suggests). This is under the assumption that
>>>> one plot function will want to import trees, another GPs, etc. Unless
>>>> we move to lazy imports, that would be against the current convention
>>>> that importing sklearn is fairly minimal.
>>>>
>>>> I do like Andy's idea of framing this discussion more clearly around
>>>> likely candidates.
>>>>
>>>> On Thu, 4 Apr 2019 at 00:10, Andreas Mueller <t3kcit at gmail.com> wrote:
>>>> >
>>>> > I think what was not clear from the question is that there is actually
>>>> > quite different kinds of plotting functions, and many of these are
>>>> tied
>>>> > to existing code.
>>>> >
>>>> > Right now we have some that are specific to trees (plot_tree) and to
>>>> > gradient boosting (plot_partial_dependence).
>>>> >
>>>> > I think we want more general functions, and plot_partial_dependence
>>>> has
>>>> > been extended to general estimators.
>>>> >
>>>> > However, the plotting functions might be generic wrt the estimator,
>>>> but
>>>> > relate to a specific function, say plotting results of GridSearchCV.
>>>> > Then one might argue that having the plotting function close to
>>>> > GridSearchCV might make sense.
>>>> > Similarly for plotting partial dependence plots and feature
>>>> importances,
>>>> > it might be a bit strange to have the plotting functions not next to
>>>> the
>>>> > functions that compute these.
>>>> > Another question would be is whether the plotting functions also "do
>>>> the
>>>> > work" in some cases:
>>>> > Do we want plot_partial_dependence also to compute the partial
>>>> > dependence? (I would argue yes but either way the result is a bit
>>>> strange).
>>>> > In that case you have somewhat of the same functionality in two
>>>> > different modules, unless you also put the "compute partial
>>>> dependence"
>>>> > function in the plotting module as well,
>>>> > which is a bit strange.
>>>> >
>>>> > Maybe we could inform this discussion by listing candidate plotting
>>>> > functions, and also considering whether they "do the work" and where
>>>> the
>>>> > "work" function is.
>>>> >
>>>> > Other examples are plotting the confusion matrix, which probably
>>>> should
>>>> > also compute the confusion matrix (it's fast and so that would be
>>>> > convenient), and so it would "duplicate" functionality from the
>>>> metrics
>>>> > module.
>>>> >
>>>> > Plotting learning curves and validation curves should probably not do
>>>> > the work as it's pretty involved, and so someone would need to import
>>>> > the learning and validation curves from model selection, and then the
>>>> > plotting functions from a plotting module.
>>>> >
>>>> > Calibrations curves and P/R curves and roc curves are also pretty fast
>>>> > to compute (and passing around the arguments is somewhat error prone)
>>>> so
>>>> > I would say the plotting functions for these should do the work as
>>>> well.
>>>> >
>>>> > Anyway, you can see that many plotting functions are actually
>>>> associated
>>>> > with functions in existing modules and the interactions are a bit
>>>> unclear.
>>>> >
>>>> > The only plotting functions I haven't mentioned so far that I thought
>>>> > about in the past are "2d scatter" and "plot decision function". These
>>>> > would be kind of generic, but mostly used in the examples.
>>>> > Though having a discrete 2d scatter function would be pretty nice
>>>> > (plt.scatter doesn't allow legends and makes it hard to use
>>>> qualitative
>>>> > color maps).
>>>> >
>>>> >
>>>> > I think I would vote for option (1), "sklearn.plot.plot_zzz" but the
>>>> > case is not really that clear.
>>>> >
>>>> > Cheers,
>>>> >
>>>> > Andy
>>>> >
>>>> > On 4/3/19 7:35 AM, Roman Yurchak via scikit-learn wrote:
>>>> > > +1 for options 1 and +0.5 for 3. Do we anticipate that many plotting
>>>> > > functions will be added? If it's just a dozen or less, putting them
>>>> all
>>>> > > into a single namespace sklearn.plot might be easier.
>>>> > >
>>>> > > This also would avoid discussion about where to put some generic
>>>> > > plotting functions (e.g.
>>>> > >
>>>> https://github.com/scikit-learn/scikit-learn/issues/13448#issuecomment-478341479
>>>> ).
>>>> > >
>>>> > > Roman
>>>> > >
>>>> > > On 03/04/2019 12:06, Trevor Stephens wrote:
>>>> > >> I think #1 if any of these... Plotting functions should hopefully
>>>> be as
>>>> > >> general as possible, so tagging with a specific type of estimator
>>>> will,
>>>> > >> in some scikit-learn utopia, be unnecessary.
>>>> > >>
>>>> > >> If a general plotter is built, where does it live in other
>>>> > >> estimator-specific namespace options? Feels awkward to put it under
>>>> > >> every estimator's namespace.
>>>> > >>
>>>> > >> Then again, there might be a #4 where there is no plot module and
>>>> > >> plotting classes live under groups of utilities like introspection,
>>>> > >> cross-validation or something?...
>>>> > >>
>>>> > >> On Wed, Apr 3, 2019 at 8:54 PM Andrew Howe <ahowe42 at gmail.com
>>>> > >> <mailto:ahowe42 at gmail.com>> wrote:
>>>> > >>
>>>> > >>      My preference would be for (1). I don't think the
>>>> sub-namespace in
>>>> > >>      (2) is necessary, and don't like (3), as I would prefer the
>>>> plotting
>>>> > >>      functions to be all in the same namespace sklearn.plot.
>>>> > >>
>>>> > >>      Andrew
>>>> > >>
>>>> > >>      <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>>> > >>      J. Andrew Howe, PhD
>>>> > >>      LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
>>>> > >>      ResearchGate Profile <
>>>> http://www.researchgate.net/profile/John_Howe12/>
>>>> > >>      Open Researcher and Contributor ID (ORCID)
>>>> > >>      <http://orcid.org/0000-0002-3553-1990>
>>>> > >>      Github Profile <http://github.com/ahowe42>
>>>> > >>      Personal Website <http://www.andrewhowe.com>
>>>> > >>      I live to learn, so I can learn to live. - me
>>>> > >>      <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>>> > >>
>>>> > >>
>>>> > >>      On Tue, Apr 2, 2019 at 3:40 PM Hanmin Qin <
>>>> qinhanmin2005 at sina.com
>>>> > >>      <mailto:qinhanmin2005 at sina.com>> wrote:
>>>> > >>
>>>> > >>          See
>>>> https://github.com/scikit-learn/scikit-learn/issues/13448
>>>> > >>
>>>> > >>          We've introduced several plotting functions (e.g.,
>>>> plot_tree and
>>>> > >>          plot_partial_dependence) and will introduce more (e.g.,
>>>> > >>          plot_decision_boundary) in the future. Consequently, we
>>>> need to
>>>> > >>          decide where to put these functions. Currently, there're 3
>>>> > >>          proposals:
>>>> > >>
>>>> > >>          (1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree)
>>>> > >>
>>>> > >>          (2) sklearn.plot.XXX.plot_YYY (e.g.,
>>>> sklearn.plot.tree.plot_tree)
>>>> > >>
>>>> > >>          (3) sklearn.XXX.plot.plot_YYY (e.g.,
>>>> > >>          sklearn.tree.plot.plot_tree, note that we won't support
>>>> from
>>>> > >>          sklearn.XXX import plot_YYY)
>>>> > >>
>>>> > >>          Joel Nothman, Gael Varoquaux and I decided to post it on
>>>> the
>>>> > >>          mailing list to invite opinions.
>>>> > >>
>>>> > >>          Thanks
>>>> > >>
>>>> > >>          Hanmin Qin
>>>> > >>          _______________________________________________
>>>> > >>          scikit-learn mailing list
>>>> > >>          scikit-learn at python.org <mailto:scikit-learn at python.org>
>>>> > >>          https://mail.python.org/mailman/listinfo/scikit-learn
>>>> > >>
>>>> > >>      _______________________________________________
>>>> > >>      scikit-learn mailing list
>>>> > >>      scikit-learn at python.org <mailto:scikit-learn at python.org>
>>>> > >>      https://mail.python.org/mailman/listinfo/scikit-learn
>>>> > >>
>>>> > >
>>>> > > _______________________________________________
>>>> > > scikit-learn mailing list
>>>> > > scikit-learn at python.org
>>>> > > https://mail.python.org/mailman/listinfo/scikit-learn
>>>> > _______________________________________________
>>>> > scikit-learn mailing list
>>>> > scikit-learn at python.org
>>>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190407/5a9ee1a9/attachment-0001.html>


More information about the scikit-learn mailing list