regex-strategy for finding *similar* words?
Daniel Dittmar
daniel.dittmar at sap.corp
Thu Nov 18 08:32:53 EST 2004
Christoph Pingel wrote:
> an interesting problem for regex nerds.
> I've got a thesaurus of some hundred words and a moderately large
> dataset of about 1 million words in some thousand small texts. Words
> from the thesaurus appear at many places in my texts, but they are often
> misspelled, just slightly different from the thesaurus.
There exists the agrep project (http://www.tgries.de/agrep/), for which
Python bindings exist. agrep (=approximate grep) allows you to specify
the number of allowed errors.
Daniel
More information about the Python-list
mailing list