fastest way for humongous regexp search?
Richie Hindle
richie at entrian.com
Tue Nov 2 11:50:59 EST 2004
[Tim]
> I've got a list of 1000 common misspellings, and I'd like to check a set
> of text files for those misspellings.
[Istvan]
> A much simpler way would be to just store these misspellings as a
> dictionary (or set), read and split each line into words, then check
> whether each of words is in the set.
[Tim]
> Thanks, I didn't know that would be faster.
> But I need to match against the misspellings in a case-insensitive
> way--that's the reason I'm using the regular expressions.
Make the misspelling set lower case, and convert the list of words from
the text file into lower case before comparing them:
>>> from sets import Set
>>> misspellings = Set(['speling', 'misteak'])
>>> text = "Does this text contain any common speling mistakes?"
>>> print [word for word in text.split() if word in misspellings]
['speling']
--
Richie Hindle
richie at entrian.com
More information about the Python-list
mailing list