Percentage matching of text

Tim Churches tchur at optushome.com.au
Fri Jul 30 11:20:06 EDT 2004


On Fri, 2004-07-30 at 23:52, Bruce Eckel wrote:
> What I'd like to do is find an algorithm that produces the results of
> a text comparison as a percentage-match. Thus I would be able to
> assert that my test samples must match the control sample by at least
> (for example) 83% for the test to pass. Clearly, this wouldn't be a
> perfect test but it would help flag problems, which is primarily what
> I need.
> 
> Does anyone know of an algorithm or library that would do this? Thanks
> in advance.

Python implementations of a range of such algorithms can be found in
Febrl - see section 9.2 of the manual:
http://datamining.anu.edu.au/projects/linkage.html#prototype_software

I suspect that a simple bigram comparison would meet your needs best. Or
just use the Python difflib module in the standard Python library which
implements the Ratcliff-Obershelp comparator.
-- 

Tim C

PGP/GnuPG Key 1024D/EAF993D0 available from keyservers everywhere
or at http://members.optushome.com.au/tchur/pubkey.asc
Key fingerprint = 8C22 BF76 33BA B3B5 1D5B  EB37 7891 46A9 EAF9 93D0






More information about the Python-list mailing list