Looking for library to estimate likeness of two strings
Steve Holden
steve at holdenweb.com
Thu Feb 7 08:38:24 EST 2008
Matthew_WARREN at bnpparibas.com wrote:
>
>
>
>
>
>
>> On Wed, 06 Feb 2008 17:32:53 -0600, Robert Kern wrote:
>>
>>> Jeff Schwab wrote:
>> ...
>>>> If the strings happen to be the same length, the Levenshtein distance
>>>> is equivalent to the Hamming distance.
>
> Is this really what the OP was asking for. If I understand it correctly,
> Levenshtein distance works out the number of edits required to transform
> the string to the target string. The smaller the more equivalent, but with
> the OP's problem I would expect
>
>
> table1 table2
> brian briam
> erian
>
>
> I think the OP would like to guess at 'briam' rather than 'erian', but
> Levenstein would rate them equally good guesses?
>
> I know this is pushing it more toward phonetic alaysis of the words or
> something similar, and thats orders of magnitude more complex.
>
> just in case,
>
> http://www.linguistlist.org/sp/Software.html#97
>
> might be a good place to start looking into it, along with the NLTK
> libraries here
>
> http://nltk.sourceforge.net/index.php/Documentation
>
You could perhaps use soundex to try to choose between different
possibilities with the same Levenshtein distance from the sample.
Soundex by itself is horrible, but it might work as a prioritizer.
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
More information about the Python-list
mailing list