Jarow-Winkler algorithm: Measuring similarity between strings
rogerb at rogerbinns.com
Sat Dec 20 02:54:22 CET 2008
-----BEGIN PGP SIGNED MESSAGE-----
> Based on examples and formulas from http://en.wikipedia.org/wiki/Jaro-Winkler.
> Useful for measuring similarity between two strings. For example if
> you want to detect that the user did a typo.
Jaro-Winkler is best when dealing with names (Winkler works for the US
census). There are pure Python and C accelerated implementations at
If you are concerned about typos then taking into account the keyboard
layout will help. For example for a user with a US keyboard, the 'a' or
'd' keys would be a common typo for 's'.
Also consider Levenshtein distance:
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
-----END PGP SIGNATURE-----
More information about the Python-list