difflib
Mikkel Rasmussen
footech at get2net.dk
Thu Apr 26 17:19:32 EDT 2001
Tim Peters <tim.one at home.com> wrote in message
news:mailman.988244561.9037.python-list at python.org...
>
> > I'm going to test your algorithm later today to see if it
> > outperforms my own both in quality and speed. Quality is the most
> > important. Speed is just nice.
>
> You later reported getting good results. I was surprised!
SequenceMatcher
> was written for ndiff.py specifically: something seeking to reconstruct
how
> one well-formed version of a text document got transformed into a another
> well-formed version. It's not at all trying to recover from noise or
> mistakes, and so has no interest in being graceful in the presence of,
e.g.,
> transpositions or reversals. Insertions and deletions, yes. So I
wouldn't
> expect it to be particularly effective for a spell-checker.
>
>
The quality nearly equalled my own fuzzy match in terms of quality of
results. The usual problem is that it returns too many probable matches when
there is a large number of possibilities (and we are talking about dyslexics
here!).
BTW, I optimised my own fuzzy via your nice and fast ratio-functions, and my
own fuzzy match is nearly as fast as yours now. :-) Thanks for the
optimisation idea! There is less than a seconds difference while searching
through more than 140.000 keys.
I have have to say that the fuzzy match is not (technically) a part of the
spelling checker. It is a final attempt to return useful results when the
main algorithm fails.
Still, it's nice with a useful, simple sequence matcher in Python.
Mikkel Rasmussen
Now nearly a computational linguist. I'm finishing my master thesis next
friday. And now back to the final edition.
More information about the Python-list
mailing list