difflib

Mikkel Rasmussen footech at get2net.dk
Thu Apr 26 17:19:32 EDT 2001


Tim Peters <tim.one at home.com> wrote in message
news:mailman.988244561.9037.python-list at python.org...
>
> > I'm going to test your algorithm later today to see if it
> > outperforms my own both in quality and speed. Quality is the most
> > important. Speed is just nice.
>
> You later reported getting good results.  I was surprised!
SequenceMatcher
> was written for ndiff.py specifically:  something seeking to reconstruct
how
> one well-formed version of a text document got transformed into a another
> well-formed version.  It's not at all trying to recover from noise or
> mistakes, and so has no interest in being graceful in the presence of,
e.g.,
> transpositions or reversals.  Insertions and deletions, yes.  So I
wouldn't
> expect it to be particularly effective for a spell-checker.
>
>

The quality nearly equalled my own fuzzy match in terms of quality of
results. The usual problem is that it returns too many probable matches when
there is a large number of possibilities (and we are talking about dyslexics
here!).

BTW, I optimised my own fuzzy via your nice and fast ratio-functions, and my
own fuzzy match is nearly as fast as yours now. :-) Thanks for the
optimisation idea! There is less than a seconds difference while searching
through more than 140.000 keys.

I have have to say that the fuzzy match is not (technically) a part of the
spelling checker. It is a final attempt to return useful results when the
main algorithm fails.

Still, it's nice with a useful, simple sequence matcher in Python.


Mikkel Rasmussen
Now nearly a computational linguist. I'm finishing my master thesis next
friday. And now back to the final edition.





More information about the Python-list mailing list