Replace stop words (remove words from a string)
gherron at islandtraining.com
Thu Jan 17 09:45:13 CET 2008
> if I have an array of "stop" words, and I want to replace those values
> with something else; in a string, how would I go about doing this. I
> have this code that splits the string and then does a difference but I
> think there is an easier approach:
> mystr =
> if I have an array stop_list = [ "[BAD]", "[BAD2]" ]
> I want to replace the values in that list with a zero length string.
> I had this before, but I don't want to use this approach; I don't want
> to use the split.
> line_list = line.lower().split()
> res = list(set(keywords_list).difference(set(ENTITY_IGNORE_LIST)))
String have a replace method that will produce a new string with (all
occurrences of) one substring replaced with another. You'd have to loop
through your stop_list one word at a time.
>>> s = 'abcxyzabc'
If either the string or the stop_list grows particularly large, this
approach won't scale very well since the whole string would be
re-created anew for each stop_list entry. In that case, I'd look into
the regular expression (re) module. You may be able to finagle a way to
find and replace all stop_list entries in one pass. (Finding them all
is easy -- not so sure you could replace them all at once though. )
More information about the Python-list