Fuzzy string matching?
achrist at easystreet.com
Fri Aug 27 04:17:33 CEST 1999
I've gotten good results with ad hoc algorithms using a longest common
contiguous substring routine. There is an algorithm in the _Algorithms_
book by Rivest, et al, that produces the longest common non-contiguous
substring, which might be a better indicator of a match, but modifying
it to test for only contiguous substrings improves efficiency much,
particularly space efficiency. The lengths of the
two or three longest common contiguous substrings give a pretty good
indication of the degree of match in the applications I've tried (name
and address cleanup). How to combine these lengths into a scalar
measure of match is the really ad-hoc part of it.
More information about the Python-list