[scikit-learn] Memory efficient TfidfVectorizer

Tue Jan 28 05:19:47 EST 2020

Are you concerned about storing the whole corpus text in memory, or the
whole corpus' statistics? If the text, use input='file' or input='filename'
(or a generator of texts).

On Tue, 28 Jan 2020 at 18:01, Peng Yu <pengyu.ut at gmail.com> wrote:

> Hi,
>
> To use TfidfVectorizer, the whole corpus must be used into memory.
> This can be a problem for machines without a lot of memory. Is there a
> way to use only a small amount of memory by saving most intermediate
> results in the disk? Thanks.
>
> --
> Regards,
> Peng
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200128/3aabc5e1/attachment.html>