<div dir="ltr">Are you concerned about storing the whole corpus text in memory, or the whole corpus' statistics? If the text, use input='file' or input='filename' (or a generator of texts).</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 28 Jan 2020 at 18:01, Peng Yu <<a href="mailto:pengyu.ut@gmail.com">pengyu.ut@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>

<br>

To use TfidfVectorizer, the whole corpus must be used into memory.<br>

This can be a problem for machines without a lot of memory. Is there a<br>

way to use only a small amount of memory by saving most intermediate<br>

results in the disk? Thanks.<br>

<br>

-- <br>

Regards,<br>

Peng<br>

_______________________________________________<br>

scikit-learn mailing list<br>

<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>

</blockquote></div>