[scikit-learn] Ipython Jupyter Kernel Dies when I fit an SGDClassifier

Sean Violante sean.violante at gmail.com
Sat Jun 3 15:58:10 EDT 2017


Have you used sparse arrays?

On Fri, Jun 2, 2017 at 7:39 PM, Stuart Reynolds <stuart at stuartreynolds.net>
wrote:

> Hmmm... is it possible to place your original data into a memmap?
> (perhaps will clear out 8Gb, depending on SGDClassifier internals?)
>
> https://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html
> https://stackoverflow.com/questions/14262433/large-data-
> work-flows-using-pandas
>
> - Stuart
>
> On Fri, Jun 2, 2017 at 10:30 AM, Sebastian Raschka <se.raschka at gmail.com>
> wrote:
> > I also think that this could be likely a memory related issue. I just
> ran the following snippet in a Jupyter Nb:
> >
> > import numpy as np
> > from sklearn.linear_model import SGDClassifier
> >
> > model = SGDClassifier(loss='log',penalty=None,alpha=0.0,
> >               l1_ratio=0.0,fit_intercept=False,n_iter=1,shuffle=False,
> learning_rate='constant',
> >              eta0=1.0)
> >
> > X = np.random.random((1000000, 1000))
> > y = np.zeros(1000000)
> > y[:1000] = 1
> >
> > model.fit(X, y)
> >
> >
> >
> > The dataset takes approx. 8 Gb, but the model fitting is consuming ~16
> Gb -- probably due to making a copy of the X array in the code. The
> Notebook didn't crash but I think on machines with smaller RAM, this could
> be an issue. One workaround you could try is to fit the model iteratively
> using partial_fit. For example, 1000 samples at a time or so:
> >
> >
> > indices = np.arange(y.shape[0])
> > batch_size = 1000
> >
> > for start_idx in range(0, indices.shape[0] - batch_size + 1,
> >                        batch_size):
> >     index_slice = indices[start_idx:start_idx + batch_size]
> >     model.partial_fit(X[index_slice], y[index_slice], classes=[0, 1])
> >
> >
> >
> > Best,
> > Sebastian
> >
> >
> >> On Jun 2, 2017, at 6:50 AM, Iván Vallés Pérez <
> ivanvallesperez at gmail.com> wrote:
> >>
> >> Are you monitoring your RAM memory consumption? I would say that it is
> the cause of the majority of the kernel crashes
> >> El El vie, 2 jun 2017 a las 12:45, Aymen J <ay.j at hotmail.fr> escribió:
> >> Hey Guys,
> >>
> >>
> >> So I'm trying to fit an SGD classifier on a dataset that has 900,000
> for about 3,600 features (high cardinality).
> >>
> >>
> >> Here is my model:
> >>
> >>
> >> model = SGDClassifier(loss='log',penalty=None,alpha=0.0,
> >>               l1_ratio=0.0,fit_intercept=False,n_iter=1,shuffle=False,
> learning_rate='constant',
> >>              eta0=1.0)
> >>
> >> When I run the model.fit function, The program runs for about 5
> minutes, and I receive the message "the kernel has died" from Jupyter.
> >>
> >> Any idea what may cause that? Is my training data too big (in terms of
> features)? Can I do anything (parameters) to finish training?
> >>
> >> Thanks in advance for your help!
> >>
> >>
> >>
> >> _______________________________________________
> >> scikit-learn mailing list
> >> scikit-learn at python.org
> >> https://mail.python.org/mailman/listinfo/scikit-learn
> >> _______________________________________________
> >> scikit-learn mailing list
> >> scikit-learn at python.org
> >> https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170603/bc18132b/attachment.html>


More information about the scikit-learn mailing list