<div dir="ltr"><div><div>Thanks for this email. It is always nice to hear about success stories.<br><br>I assume the guilty party is Issam Laradji, as you can see from his Google Summer of Code blog post:<br><br><a href="http://issamlaradji.blogspot.jp/2014/06/week-3-gsoc-2014-extending-neural.html">http://issamlaradji.blogspot.jp/2014/06/week-3-gsoc-2014-extending-neural.html</a><br><br>L-BFGS is indeed usually a good default choice for medium-scale datasets. It doesn't require any step size tuning and I found recently that it works well for poorly conditioned problems.<br><br></div>You can also see a blog post by Nicolas Le Roux praising L-BFGS here:<br><br><a href="http://labs.criteo.com/2014/09/poh-part-3-distributed-optimization/">http://labs.criteo.com/2014/09/poh-part-3-distributed-optimization/</a><br><br></div>Mathieu<br><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Aug 26, 2017 at 12:40 AM, Dr. Mario Michael Krell <span dir="ltr"><<a href="mailto:mario.michael.krell@gmail.com" target="_blank">mario.michael.krell@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">To whoever programmed the MLPClassifier (with the L-BFGS solver),<div><br></div><div>I just wanted to personally thank you and if I get your name(s), I would mention it/them in my paper additionally to the mandatory sklearn citation.</div><div><br></div><div>I hope that sklearn will be keeping this algorithm forever in their library despite the increasing amount of established deep learning libraries that seem to make this code obsolete. For my small scale, more theoretic analysis, it worked much better than any other algorithm and I would not have gotten such surprising results. Due to the high quality implementation, the integration of a much better solver than SGD, and the respective good documentation, I could show empirically how the VC dimension and another property of MLPs (MacKay dimension) actually scale linear with the number of edges in the respective graph which helped us to provide a new much more strict upper bound (<a href="https://arxiv.org/abs/1708.06019" target="_blank">https://arxiv.org/abs/1708.<wbr>06019</a>). This would have not been possible with other implementations. If there is an interest by the developers, I could try to contribute a tutorial documentation for sklearn. Just let me know.</div><div><br></div><div>Thank you a lot!!!</div><div><br></div><div>Best,</div><div><br></div><div>Mario</div></div><br>______________________________<wbr>_________________<br>

scikit-learn mailing list<br>

<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

<br></blockquote></div><br></div>