[scikit-learn] Memory efficient TfidfVectorizer
Peng Yu
pengyu.ut at gmail.com
Tue Jan 28 06:26:34 EST 2020
> Are you concerned about storing the whole corpus text in memory, or the
> whole corpus' statistics? If the text, use input='file' or input='filename'
> (or a generator of texts).
I am not really sure which stage takes the most memory as my program
kills itself due to memory limitation. But I suspect it is the latter
(whole corpus statistics) that takes the most memory? (I used
1<=ngram<=3).
--
Regards,
Peng
More information about the scikit-learn
mailing list