[scikit-learn] Optimization algorithms in scikit-learn

Gael Varoquaux gael.varoquaux at normalesup.org
Tue Sep 4 14:53:40 EDT 2018

This is out of the scope of scikit-learn, which is a toolkit meant to be
used for easier machine learning. Optimization is a component of machine
learning, but not one that is readily-useable by itself.


On Tue, Sep 04, 2018 at 12:45:09PM -0600, Touqir Sajed wrote:
> Hi Andreas,

> Is there a particular reason why there is no general purpose optimization
> module? Most of the optimizers (atleast the first order methods) are general
> purpose since you just need to feed the gradient. In some special cases, you
> probably need problem specific formulation for better performance. The
> advantage of SVRG is that you don't need to store the gradients which costs a
> storage of order number_of_weights*number_of_samples which is the main problem
> with SAG and SAGA. Thus, for most neural network models (and even non-NN
> models) using SAG and SAGA is infeasible on personal computers. 

> SVRG is not popular in deep learning community but it should be noted that SVRG
> is different from Adam since it does not tune the step size. Just to clarify,
> SVRG can be faster than Adam since it decreases the variance to achieve a
> similar convergence rate as full batch methods while being computationally
> cheap like SGD/Adam. However, one can combine both methods to obtain an even
> faster algorithm.

> Cheers,
> Touqir
> *

> On Tue, Sep 4, 2018 at 11:46 AM Andreas Mueller <t3kcit at gmail.com> wrote:

>     Hi Touqir.
>     We don't usually implement general purpose optimizers in
>     scikit-learn, in particular because usually different optimizers
>     apply to different kinds of problems.
>     For linear models we have SAG and SAGA, for neural nets we have adam.
>     I don't think the authors claim to be faster than SAG, so I'm not sure what
>     the
>     motivation would be for using their method.

>     Best,
>     Andy

>     On 09/04/2018 12:55 PM, Touqir Sajed wrote:

>         Hi,

>         I have been looking for stochastic optimization algorithms in
>         scikit-learn that are faster than SGD and so far I have come across
>         Adam and momentum. Are there other methods implemented in scikit-learn?
>         Particularly, the variance reduction methods such as SVRG (https://
>         papers.nips.cc/paper/
>         4937-accelerating-stochastic-gradient-descent-using-predictive-variance-reduction.pdf
>         ) ? These variance reduction methods are the current state of the art
>         in terms of convergence speed while maintaining runtime complexity of
>         order n -- number of features. If they are not implemented yet, I think
>         it would be really great to implement(I am happy to do so) them since
>         nowadays working on large datasets(where LBGFS may not be practical) is
>         the norm where the improvements are definitely worth it.

>         Cheers,
>         Touqir
    Gael Varoquaux
    Senior Researcher, INRIA Parietal
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

More information about the scikit-learn mailing list