[Tutor] longest common substring

Fri Nov 11 15:32:00 CET 2011

On Fri, Nov 11, 2011 at 9:10 PM, Andreas Perstinger
<andreas.perstinger at gmx.net> wrote:
> On 2011-11-11 05:14, lina wrote:
>>
>> def LongestCommonSubstring(S1, S2):
>>     M = [[0]*(1+len(S2)) for i in xrange(1+len(S1))] ## creat 4*5 matrix
>>     longest, x_longest = 0, 0
>>     for x in xrange(1,1+len(S1)):                 ## read each row
>>         for y in xrange(1,1+len(S2)):             ## read each coloumn
>>             if S1[x-1] == S2[y-1]:
>>                 M[x][y] = M[x-1][y-1]+1
>>                 if M[x][y]>  longest:
>>                     longest = M[x][y]
>>                     x_longest = x
>>                 else:
>>                     M[x][y] = 0
>>     return S1[x_longest-longest:x_longest]
>
> That's still not the right version.
>
> If you compare your version to the one at wikibooks (
> http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring#Python
> ), you'll see that the else-branch is wrongly indented (one level too deep).
> It belongs to the first if-comparison:
>
> if S1 ...
>     M[x][y] ...
>     if M[x][y] ...
>        ...
> else: ...
>
>
>> if __name__=="__main__":
>>
>>     a=open("atom-pair_4.txt","r").readline().strip()
>>
>>     b=open("atom-pair_8.txt","r").readline().strip()
>>
>>
>> print(LongestCommonSubstring(LongestCommonSubstring(a,a),LongestCommonSubstring(b,b)))
>
>                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ??? What do you try to accomplish here ???
> You call "LongestCommonSubstring" with identical strings, thus the result
> must be the same string.
> Why not
>
> print(LongestCommonSubstring(a, b))
>
> as you did the line below with "c" and "d"?
>
> Further more I think that your problems start with bad data files. In every
> file there is just one very long line which looks like a string
> representation of a list of two-digits strings. This complicates further
> processing because you have to deal with all the unnecessary commas, blanks
> and single quotes between your numbers and the square brackets at the
> beginning and the end of the line.
>
>
>>  $ python3 LongestCommonSubstring.py
>> 2189
>> ['
>> ['82']
>>
>> The results are wrong.
>> c, d are the string from file atom-pair_4,txt, exactly the same as a,
>> d is the same as b.
>>
>> and even for (c,d) results are not correct, visually we can see some
>> similar groups, not mention the longest groups.
>
> And even if you use the correct function from wikibooks I can anticipate
> another problem :-)
> The implementation from wikibooks just returns the first common substring
> which it finds in the first string:
>
>>>> WikibooksLongestCommonSubstring("ABAB","BABA")
> 'ABA'
>>>> WikibooksLongestCommonSubstring("BABA", "ABAB")
> 'BAB'
>
> If there are more possible substrings with the same length (as in the
> example above) only the first one is returned.
>
> But in your example there are at least two different pathways (I've found
> three) which have the same length, as changing the order of the parameters
> will show you:
>
>>>> WikibooksLongestCommonSubstring(c, d)
> ['61', '70', '61']
>>>> WikibooksLongestCommonSubstring(d, c)
> ['83', '61', '83']

The residues of WikibooksLongestCommonSubstring(d, c) and
WikibooksLongestCommonSubstring(c,d) is different very largely,

I mean, it might be totally different.

so the possible subgroups are the union of groups of (d,c) and (c,d)?

I am really lack a programed-brain to think.

Thanks again,

>
> Bye, Andreas
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>