newb: comapring two strings
sjmachin at lexicon.net
Fri May 19 23:14:10 CEST 2006
Paul Rubin wrote:
> bearophileH... at lycos.com writes:
>>> I have suggested C because if the words are all of the same length then
>>> you have 30000^2 = 90 000 000 000 pairs to test.
>> Sorry, you have (n*(n-1))/2 pairs to test (~ 45 000 000 000).
> Still terrible. Use a better algorithm!
To put all this in perspective, here's a very similar real-world
You have a customer database with 300,000 records. Some of them are
duplicated, because there are differences in recording the customers'
names, like a one-keystroke typo. The task is to locate pairs of rows
which are likely to be duplicates. 45 billion comaprisons may be
And here's a clue for a better algorithm: Knuth's TAOCP vol 3 [1973
edition] chapter 6 (searching) third page "as if the words were spelled
backwards". If you find yourself reading about soundex, you've
definitely gone too far :-)
: "comapring" in the subject: is this a game of 'Spot the deliberate
mistake"? Is the OP counting a transposition of adjacent characters as
"varying by one character"?
More information about the Python-list