newb: comapring two strings

johnzenger at gmail.com johnzenger at gmail.com
Fri May 19 05:25:01 CEST 2006


manstey wrote:
> Hi,
>
> Is there a clever way to see if two strings of the same length vary by
> only one character, and what the character is in both strings.

You want zip.

def diffbyonlyone(string1, string2):
    diffcount = 0
    for c1, c2 in zip(string1, string2):
        if c1 != c2:
            diffcount += 1
        if diffcount > 1:
            return False
    return diffcount == 1

print diffbyonlyone("yaqtil","yaqtel") # True
print diffbyonlyone("yiqtol","yaqtel") # False

If your strings are long, it might be faster/more memory efficient to
use itertools.izip instead.

> My next problem is, I have a list of 300,000+ words and I want to find
> every pair of such strings. I thought I would first sort on length of
> string, but how do I iterate through the following:
>
> str1
> str2
> str3
> str4
> str5
>
> so that I compare str1 & str2, str1 & str3, str 1 & str4, str1 & str5,
> str2 & str3, str3 & str4, str3 & str5, str4 & str5.

for index1 in xrange(len(words)):
    for index2 in xrange(index1+1,len(words)):
        if diffbyonlyone(words[index1], words[index2]):
            print words[index1] + " -- " + words[index2]

...but by all means run that only on sets of words that you have
already identified, pursuant to some criteria like word length, to be
likely matches.  Do the math; that's a lot of comparisons!




More information about the Python-list mailing list