How to make sure stop words are matched when lowercase=False?
Jan. 28, 2020
2:47 p.m.
Hi, https://github.com/scikit-learn/scikit-learn/blob/002f891a33b612be389d9c4886... The default of lowercase is True. But stopwords are lower case. Where is the code to make sure the stop words are removed when they are not in lower case? https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_ext... -- Regards, Peng
January 2020
10:39 p.m.
There is no such code. You need to make sure that the normalisation you use matches the normalisation applied when constructing a stop word list. Unfortunately we do not provide for this directly, and it is not easy to do so in the general case.
2189
Age (days ago)
2189
Last active (days ago)
1 comments
2 participants
participants (2)
-
Joel Nothman -
Peng Yu