Find closest matching string based on collection of strings in list/dict/set

Shashwat Anand anand.shashwat at gmail.com
Tue Aug 31 23:25:58 CEST 2010


On Wed, Sep 1, 2010 at 2:31 AM, Joel Goldstick <
joel.goldstick at columbuswebmakers.com> wrote:

> python at bdurham.com wrote:
>
>> I'm parsing a simple, domain specific scripting language that has
>> commands like the following: *page, *title, *text, *footer, etc.
>> There are about 100 of these '*' commands. When we see a command
>> that we don't recognize, I would like to find the closest match
>> possible (from a list of all legal commands) and include this
>> suggestion in our diagnostic output.
>>
>> I'm not sure what you would call the type of algorithm I'm
>> looking for: closest matching string or auto-correct?
>>
>> Any suggestions on algorithms or python libraries that would help
>> me do what I'm looking for?
>>
>> Here's my short-list of ideas based on my research so far:
>> - Soundex
>> - Lawrence Philips' Metaphone Algorithm (from aspell?)
>> - Edit Distance Algorithm
>> - Levenstein Word Distance
>>
>> Any suggestions, comments on the above techniques, or ideas on a
>> simpler algorithm for finding close string matches based on a
>> list, dict, or set of possible strings?
>>
>> Thank you,
>> Malcolm
>>
>>
>>  Have you looked at difflib?
>

On a side note, you can read this awesome article by Peter Norvig.
http://norvig.com/spell-correct.html


> --
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
~l0nwlf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100901/e2f412bb/attachment.html>


More information about the Python-list mailing list