[scikit-learn] scikit-learn Digest, Vol 43, Issue 25

João André jandre at lnec.pt
Wed Oct 16 10:16:50 EDT 2019


Dear Scikit-learn,

This is my first message in this community!

I make it because I think "model complexity" and "model prediction" are two
separate "properties", which cannot in principle be directly compared. This
is because one variable is missing, which is the data.
If the initial data set corresponds to the entire true range of possible
data, then I would say complex models will "model" the variable being
studied with a prediction accuracy equal or better than any other "less
complex" model. If the data set is not representative, then you might
overfit with more complex models and there is a chance that more simple
models will predict better for unseen sets of data.
Therefore, the quality of the data is critical to judge how good will your
model be.
Hope this helps.
João


João André
Civil Engineer, M.Sc., Ph.D.
Structures Department
National Laboratory for Civil Engineering
LNEC, Av. Brasil 101, 1700-066 Lisbon, Portugal
Web: http://www.lnec.pt/
Skype ID: jpcgandre
Phone: (+351) 218 443 355


On Wed, 16 Oct 2019 at 15:05, Gael Varoquaux <gael.varoquaux at normalesup.org>
wrote:

> On Sun, Oct 13, 2019 at 07:40:11PM +0900, Brown J.B. via scikit-learn
> wrote:
> > Please, respect and refinement when addressing the contributors and
> users of
> > scikit-learn.
>
> I believe that Mike simply misread. It's something that happens (it
> happens a lot to me).
>
> No harm on my side, and thanks for clarifying my overly short reply.
>
> G
>
> > Gael's statement is perfect -- complexity does not imply better
> prediction.
> > The choice of estimator (and algorithm) depends on the structure of the
> model
> > desired for the data presented.
> > Estimator superiority cannot be proven in a context- and/or data-agnostic
> > fashion.
>
> > J.B.
>
>
> > 2019年10月13日(日) 6:13 Mike Smith <javaeurusd at gmail.com>:
>
> >     "Second complexity does not
> >     > imply better prediction. "
>
> >     Complexity doesn't imply prediction? Perhaps you're having a
> translation
> >     error.
>
> >     On Sat, Oct 12, 2019 at 2:04 PM <scikit-learn-request at python.org>
> wrote:
>
> >         Send scikit-learn mailing list submissions to
> >                 scikit-learn at python.org
>
> >         To subscribe or unsubscribe via the World Wide Web, visit
> >                 https://mail.python.org/mailman/listinfo/scikit-learn
> >         or, via email, send a message with subject or body 'help' to
> >                 scikit-learn-request at python.org
>
> >         You can reach the person managing the list at
> >                 scikit-learn-owner at python.org
>
> >         When replying, please edit your Subject line so it is more
> specific
> >         than "Re: Contents of scikit-learn digest..."
>
>
> >         Today's Topics:
>
> >            1. Re: scikit-learn Digest, Vol 43, Issue 24 (Mike Smith)
>
>
> >
>  ----------------------------------------------------------------------
>
> >         Message: 1
> >         Date: Sat, 12 Oct 2019 14:04:12 -0700
> >         From: Mike Smith <javaeurusd at gmail.com>
> >         To: scikit-learn at python.org
> >         Subject: Re: [scikit-learn] scikit-learn Digest, Vol 43, Issue 24
> >         Message-ID:
> >                 <CAEWZffD-hNviFkyxuM8CgDR3XSWOyn=
> >         4LRy2NJvjwvVr4RgobQ at mail.gmail.com>
> >         Content-Type: text/plain; charset="utf-8"
>
> >         "...  > If I should expect good results on a pc, scikit says that
> >         needing
> >         gpu power is
> >         > obsolete, since certain scikit models perform better (than ml
> >         designed
> >         for gpu)
> >         > that are not designed for gpu, for that reason. Is this true?"
>
> >         Where do you see this written? I think that you are looking for
> overly
> >         simple stories that you are not true."
>
> >         Gael, see the below from the scikit-learn FAQ. You can also find
> this
> >         yourself at the main FAQ:
>
> >         [image: 2019-10-12 14_00_05-Frequently Asked Questions ?
> scikit-learn
> >         0.21.3 documentation.png]
>
>
> >         On Sat, Oct 12, 2019 at 9:03 AM <scikit-learn-request at python.org
> >
> >         wrote:
>
> >         > Send scikit-learn mailing list submissions to
> >         >         scikit-learn at python.org
>
> >         > To subscribe or unsubscribe via the World Wide Web, visit
> >         >         https://mail.python.org/mailman/listinfo/scikit-learn
> >         > or, via email, send a message with subject or body 'help' to
> >         >         scikit-learn-request at python.org
>
> >         > You can reach the person managing the list at
> >         >         scikit-learn-owner at python.org
>
> >         > When replying, please edit your Subject line so it is more
> specific
> >         > than "Re: Contents of scikit-learn digest..."
>
>
> >         > Today's Topics:
>
> >         >    1. Re: Is scikit-learn implying neural nets are the best
> >         >       regressor? (Gael Varoquaux)
>
>
>
> >
>  ----------------------------------------------------------------------
>
> >         > Message: 1
> >         > Date: Fri, 11 Oct 2019 13:34:33 -0400
> >         > From: Gael Varoquaux <gael.varoquaux at normalesup.org>
> >         > To: Scikit-learn mailing list <scikit-learn at python.org>
> >         > Subject: Re: [scikit-learn] Is scikit-learn implying neural
> nets are
> >         >         the best regressor?
> >         > Message-ID: <
> 20191011173433.bbywiqnwjjpvsi4r at phare.normalesup.org>
> >         > Content-Type: text/plain; charset=iso-8859-1
>
> >         > On Fri, Oct 11, 2019 at 10:10:32AM -0700, Mike Smith wrote:
> >         > > In other words, according to that arrangement, is
> scikit-learn
> >         implying
> >         > that
> >         > > section 1.17 is the best regressor out of the listed, 1.1 to
> 1.17?
>
> >         > No.
>
> >         > First they are not ordered in order of complexity (Naive Bayes
> is
> >         > arguably simpler than Gaussian Processes). Second complexity
> does not
> >         > imply better prediction.
>
> >         > > If I should expect good results on a pc, scikit says that
> needing
> >         gpu
> >         > power is
> >         > > obsolete, since certain scikit models perform better (than ml
> >         designed
> >         > for gpu)
> >         > > that are not designed for gpu, for that reason. Is this true?
>
> >         > Where do you see this written? I think that you are looking for
> >         overly
> >         > simple stories that you are not true.
>
> >         > > How much hardware is a practical expectation for running the
> best
> >         > > scikit models and getting the best results?
>
> >         > This is too vague a question for which there is no answer.
>
> >         > Ga?l
>
> >         > > On Fri, Oct 11, 2019 at 9:02 AM <
> scikit-learn-request at python.org>
> >         wrote:
>
> >         > >     Send scikit-learn mailing list submissions to
> >         > >     ? ? ? ? scikit-learn at python.org
>
> >         > >     To subscribe or unsubscribe via the World Wide Web, visit
> >         > >     ? ? ? ?
> https://mail.python.org/mailman/listinfo/scikit-learn
> >         > >     or, via email, send a message with subject or body
> 'help' to
> >         > >     ? ? ? ? scikit-learn-request at python.org
>
> >         > >     You can reach the person managing the list at
> >         > >     ? ? ? ? scikit-learn-owner at python.org
>
> >         > >     When replying, please edit your Subject line so it is
> more
> >         specific
> >         > >     than "Re: Contents of scikit-learn digest..."
>
>
> >         > >     Today's Topics:
>
> >         > >     ? ?1. Re: logistic regression results are not stable
> between
> >         > >     ? ? ? solvers (Andreas Mueller)
>
>
>
> >         >
> >
>  ----------------------------------------------------------------------
>
> >         > >     Message: 1
> >         > >     Date: Fri, 11 Oct 2019 15:42:58 +0200
> >         > >     From: Andreas Mueller <t3kcit at gmail.com>
> >         > >     To: scikit-learn at python.org
> >         > >     Subject: Re: [scikit-learn] logistic regression results
> are not
> >         > stable
> >         > >     ? ? ? ? between solvers
> >         > >     Message-ID: <
> d55949d6-3355-f892-f6b3-030edf1c7947 at gmail.com>
> >         > >     Content-Type: text/plain; charset="utf-8";
> Format="flowed"
>
>
>
> >         > >     On 10/10/19 1:14 PM, Beno?t Presles wrote:
>
> >         > >     > Thanks for your answers.
>
> >         > >     > On my real data, I do not have so many samples. I have
> a bit
> >         more
> >         > than
> >         > >     > 200 samples in total and I also would like to get some
> >         results with
> >         > >     > unpenalized logisitic regression.
> >         > >     > What do you suggest? Should I switch to the lbfgs
> solver?
> >         > >     Yes.
> >         > >     > Am I sure that with this solver I will not have any
> >         convergence
> >         > issue
> >         > >     > and always get the good result? Indeed, I did not get
> any
> >         > convergence
> >         > >     > warning with saga, so I thought everything was fine. I
> >         noticed some
> >         > >     > issues only when I decided to test several solvers.
> Without
> >         > comparing
> >         > >     > the results across solvers, how to be sure that the
> >         optimisation
> >         > goes
> >         > >     > well? Shouldn't scikit-learn warn the user somehow if
> it is
> >         not
> >         > the case?
> >         > >     We should attempt to warn in the SAGA solver if it
> doesn't
> >         converge.
> >         > >     That it doesn't raise a convergence warning should
> probably be
> >         > >     considered a bug.
> >         > >     It uses the maximum weight change as a stopping
> criterion right
> >         now.
> >         > >     We could probably compute the dual objective once in the
> end to
> >         see
> >         > if
> >         > >     we converged, right? Or is that not possible with SAGA?
> If not,
> >         we
> >         > might
> >         > >     want to caution that no convergence warning will be
> raised.
>
>
> >         > >     > At last, I was using saga because I also wanted to do
> some
> >         feature
> >         > >     > selection by using l1 penalty which is not supported by
> >         lbfgs...
> >         > >     You can use liblinear then.
>
>
>
> >         > >     > Best regards,
> >         > >     > Ben
>
>
> >         > >     > Le 09/10/2019 ? 23:39, Guillaume Lema?tre a ?crit?:
> >         > >     >> Ups I did not see the answer of Roman. Sorry about
> that. It
> >         is
> >         > coming
> >         > >     >> back to the same conclusion :)
>
> >         > >     >> On Wed, 9 Oct 2019 at 23:37, Guillaume Lema?tre
> >         > >     >> <g.lemaitre58 at gmail.com <mailto:
> g.lemaitre58 at gmail.com>>
> >         wrote:
>
> >         > >     >>? ? ?Uhm actually increasing to 10000 samples solve the
> >         convergence
> >         > >     issue.
> >         > >     >>? ? ?SAGA is not designed to work with a so small
> sample size
> >         most
> >         > >     >>? ? ?probably.
>
> >         > >     >>? ? ?On Wed, 9 Oct 2019 at 23:36, Guillaume Lema?tre
> >         > >     >>? ? ?<g.lemaitre58 at gmail.com <mailto:
> g.lemaitre58 at gmail.com>>
> >         > wrote:
>
> >         > >     >>? ? ? ? ?I slightly change the bench such that it uses
> >         pipeline and
> >         > >     >>? ? ? ? ?plotted the coefficient:
>
> >         > >     >>? ? ? ? ?https://gist.github.com/glemaitre/
> >         > >     8fcc24bdfc7dc38ca0c09c56e26b9386
>
> >         > >     >>? ? ? ? ?I only see one of the 10 splits where SAGA is
> not
> >         > converging,
> >         > >     >>? ? ? ? ?otherwise the coefficients
> >         > >     >>? ? ? ? ?look very close (I don't attach the figure
> here but
> >         they
> >         > can
> >         > >     >>? ? ? ? ?be plotted using the snippet).
> >         > >     >>? ? ? ? ?So apart from this second split, the other
> >         differences
> >         > seems
> >         > >     >>? ? ? ? ?to be numerical instability.
>
> >         > >     >>? ? ? ? ?Where I have some concern is regarding the
> >         convergence
> >         > rate
> >         > >     >>? ? ? ? ?of SAGA but I have no
> >         > >     >>? ? ? ? ?intuition to know if this is normal or not.
>
> >         > >     >>? ? ? ? ?On Wed, 9 Oct 2019 at 23:22, Roman Yurchak
> >         > >     >>? ? ? ? ?<rth.yurchak at gmail.com <mailto:
> rth.yurchak at gmail.com
>
> >         > wrote:
>
> >         > >     >>? ? ? ? ? ? ?Ben,
>
> >         > >     >>? ? ? ? ? ? ?I can confirm your results with
> penalty='none'
> >         and
> >         > C=1e9.
> >         > >     >>? ? ? ? ? ? ?In both cases,
> >         > >     >>? ? ? ? ? ? ?you are running a mostly unpenalized
> logisitic
> >         > >     >>? ? ? ? ? ? ?regression. Usually
> >         > >     >>? ? ? ? ? ? ?that's less numerically stable than with
> a small
> >         > >     >>? ? ? ? ? ? ?regularization,
> >         > >     >>? ? ? ? ? ? ?depending on the data collinearity.
>
> >         > >     >>? ? ? ? ? ? ?Running that same code with
> >         > >     >>? ? ? ? ? ? ?? - larger penalty ( smaller C values)
> >         > >     >>? ? ? ? ? ? ?? - or larger number of samples
> >         > >     >>? ? ? ? ? ? ?? yields for me the same coefficients (up
> to
> >         some
> >         > >     tolerance).
>
> >         > >     >>? ? ? ? ? ? ?You can also see that SAGA convergence is
> not
> >         good by
> >         > the
> >         > >     >>? ? ? ? ? ? ?fact that it
> >         > >     >>? ? ? ? ? ? ?needs 196000 epochs/iterations to
> converge.
>
> >         > >     >>? ? ? ? ? ? ?Actually, I have often seen convergence
> issues
> >         with
> >         > SAG
> >         > >     >>? ? ? ? ? ? ?on small
> >         > >     >>? ? ? ? ? ? ?datasets (in unit tests), not fully sure
> why.
>
> >         > >     >>? ? ? ? ? ? ?--
> >         > >     >>? ? ? ? ? ? ?Roman
>
> >         > >     >>? ? ? ? ? ? ?On 09/10/2019 22:10, serafim loukas wrote:
> >         > >     >>? ? ? ? ? ? ?> The predictions across solver are
> exactly the
> >         same
> >         > when
> >         > >     >>? ? ? ? ? ? ?I run the code.
> >         > >     >>? ? ? ? ? ? ?> I am using 0.21.3 version. What is
> yours?
> >         > >     >>? ? ? ? ? ? ?>
> >         > >     >>? ? ? ? ? ? ?>
> >         > >     >>? ? ? ? ? ? ?> In [13]: import sklearn
> >         > >     >>? ? ? ? ? ? ?>
> >         > >     >>? ? ? ? ? ? ?> In [14]: sklearn.__version__
> >         > >     >>? ? ? ? ? ? ?> Out[14]: '0.21.3'
> >         > >     >>? ? ? ? ? ? ?>
> >         > >     >>? ? ? ? ? ? ?>
> >         > >     >>? ? ? ? ? ? ?> Serafeim
> >         > >     >>? ? ? ? ? ? ?>
> >         > >     >>? ? ? ? ? ? ?>
> >         > >     >>? ? ? ? ? ? ?>
> >         > >     >>? ? ? ? ? ? ?>> On 9 Oct 2019, at 21:44, Beno?t Presles
> >         > >     >>? ? ? ? ? ? ?<benoit.presles at u-bourgogne.fr
> >         > >     >>? ? ? ? ? ? ?<mailto:benoit.presles at u-bourgogne.fr>
> >         > >     >>? ? ? ? ? ? ?>> <mailto:benoit.presles at u-bourgogne.fr
> >         > >     >>? ? ? ? ? ? ?<mailto:benoit.presles at u-bourgogne.fr>>>
> wrote:
> >         > >     >>? ? ? ? ? ? ?>>
> >         > >     >>? ? ? ? ? ? ?>> (y_pred_lbfgs==y_pred_saga).all() ==
> False
> >         > >     >>? ? ? ? ? ? ?>
> >         > >     >>? ? ? ? ? ? ?>
> >         > >     >>? ? ? ? ? ? ?>
> >         _______________________________________________
> >         > >     >>? ? ? ? ? ? ?> scikit-learn mailing list
> >         > >     >>? ? ? ? ? ? ?> scikit-learn at python.org <mailto:
> >         > scikit-learn at python.org>
> >         > >     >>? ? ? ? ? ? ?>
> >         > https://mail.python.org/mailman/listinfo/scikit-learn
> >         > >     >>? ? ? ? ? ? ?>
>
> >         > >     >>? ? ? ? ? ?
> ?_______________________________________________
> >         > >     >>? ? ? ? ? ? ?scikit-learn mailing list
> >         > >     >>? ? ? ? ? ? ?scikit-learn at python.org <mailto:
> >         > scikit-learn at python.org>
> >         > >     >>? ? ? ? ? ? ?https://mail.python.org/mailman/listinfo/
> >         scikit-learn
>
>
>
> >         > >     >>? ? ? ? ?--
> >         > >     >>? ? ? ? ?Guillaume Lemaitre
> >         > >     >>? ? ? ? ?Scikit-learn @ Inria Foundation
> >         > >     >>? ? ? ? ?https://glemaitre.github.io/
>
>
>
> >         > >     >>? ? ?--
> >         > >     >>? ? ?Guillaume Lemaitre
> >         > >     >>? ? ?Scikit-learn @ Inria Foundation
> >         > >     >>? ? ?https://glemaitre.github.io/
>
>
>
> >         > >     >> --
> >         > >     >> Guillaume Lemaitre
> >         > >     >> Scikit-learn @ Inria Foundation
> >         > >     >> https://glemaitre.github.io/
>
> >         > >     >> _______________________________________________
> >         > >     >> scikit-learn mailing list
> >         > >     >> scikit-learn at python.org
> >         > >     >> https://mail.python.org/mailman/listinfo/scikit-learn
>
> >         > >     > _______________________________________________
> >         > >     > scikit-learn mailing list
> >         > >     > scikit-learn at python.org
> >         > >     > https://mail.python.org/mailman/listinfo/scikit-learn
>
> >         > >     -------------- next part --------------
> >         > >     An HTML attachment was scrubbed...
> >         > >     URL: <
> >         >
> http://mail.python.org/pipermail/scikit-learn/attachments/20191011/
> >         > >     a7052cd9/attachment-0001.html>
>
> >         > >     ------------------------------
>
> >         > >     Subject: Digest Footer
>
> >         > >     _______________________________________________
> >         > >     scikit-learn mailing list
> >         > >     scikit-learn at python.org
> >         > >     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> >         > >     ------------------------------
>
> >         > >     End of scikit-learn Digest, Vol 43, Issue 21
> >         > >     ********************************************
>
>
> >         > > _______________________________________________
> >         > > scikit-learn mailing list
> >         > > scikit-learn at python.org
> >         > > https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> >         > --
> >         >     Gael Varoquaux
> >         >     Research Director, INRIA              Visiting professor,
> McGill
> >         >     http://gael-varoquaux.info            http://twitter.com/
> >         GaelVaroquaux
>
>
> >         > ------------------------------
>
> >         > Subject: Digest Footer
>
> >         > _______________________________________________
> >         > scikit-learn mailing list
> >         > scikit-learn at python.org
> >         > https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> >         > ------------------------------
>
> >         > End of scikit-learn Digest, Vol 43, Issue 24
> >         > ********************************************
>
> >         -------------- next part --------------
> >         An HTML attachment was scrubbed...
> >         URL: <http://mail.python.org/pipermail/scikit-learn/attachments/
> >         20191012/6959d075/attachment.html>
> >         -------------- next part --------------
> >         A non-text attachment was scrubbed...
> >         Name: 2019-10-12 14_00_05-Frequently Asked Questions ?
> scikit-learn
> >         0.21.3 documentation.png
> >         Type: image/png
> >         Size: 26245 bytes
> >         Desc: not available
> >         URL: <http://mail.python.org/pipermail/scikit-learn/attachments/
> >         20191012/6959d075/attachment.png>
>
> >         ------------------------------
>
> >         Subject: Digest Footer
>
> >         _______________________________________________
> >         scikit-learn mailing list
> >         scikit-learn at python.org
> >         https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> >         ------------------------------
>
> >         End of scikit-learn Digest, Vol 43, Issue 25
> >         ********************************************
>
> >     _______________________________________________
> >     scikit-learn mailing list
> >     scikit-learn at python.org
> >     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> --
>     Gael Varoquaux
>     Research Director, INRIA              Visiting professor, McGill
>     http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20191016/72fc6856/attachment-0001.html>


More information about the scikit-learn mailing list