[scikit-learn] Optimization algorithms in scikit-learn
gael.varoquaux at normalesup.org
Tue Sep 4 14:53:40 EDT 2018
This is out of the scope of scikit-learn, which is a toolkit meant to be
used for easier machine learning. Optimization is a component of machine
learning, but not one that is readily-useable by itself.
On Tue, Sep 04, 2018 at 12:45:09PM -0600, Touqir Sajed wrote:
> Hi Andreas,
> Is there a particular reason why there is no general purpose optimization
> module? Most of the optimizers (atleast the first order methods) are general
> purpose since you just need to feed the gradient. In some special cases, you
> probably need problem specific formulation for better performance. The
> advantage of SVRG is that you don't need to store the gradients which costs a
> storage of order number_of_weights*number_of_samples which is the main problem
> with SAG and SAGA. Thus, for most neural network models (and even non-NN
> models) using SAG and SAGA is infeasible on personal computers.
> SVRG is not popular in deep learning community but it should be noted that SVRG
> is different from Adam since it does not tune the step size. Just to clarify,
> SVRG can be faster than Adam since it decreases the variance to achieve a
> similar convergence rate as full batch methods while being computationally
> cheap like SGD/Adam. However, one can combine both methods to obtain an even
> faster algorithm.
> On Tue, Sep 4, 2018 at 11:46 AM Andreas Mueller <t3kcit at gmail.com> wrote:
> Hi Touqir.
> We don't usually implement general purpose optimizers in
> scikit-learn, in particular because usually different optimizers
> apply to different kinds of problems.
> For linear models we have SAG and SAGA, for neural nets we have adam.
> I don't think the authors claim to be faster than SAG, so I'm not sure what
> motivation would be for using their method.
> On 09/04/2018 12:55 PM, Touqir Sajed wrote:
> I have been looking for stochastic optimization algorithms in
> scikit-learn that are faster than SGD and so far I have come across
> Adam and momentum. Are there other methods implemented in scikit-learn?
> Particularly, the variance reduction methods such as SVRG (https://
> ) ? These variance reduction methods are the current state of the art
> in terms of convergence speed while maintaining runtime complexity of
> order n -- number of features. If they are not implemented yet, I think
> it would be really great to implement(I am happy to do so) them since
> nowadays working on large datasets(where LBGFS may not be practical) is
> the norm where the improvements are definitely worth it.
Senior Researcher, INRIA Parietal
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone: ++ 33-1-69-08-79-68
More information about the scikit-learn