[Tutor] longest common substring

lina lina.lastname at gmail.com
Fri Nov 11 14:44:15 CET 2011

```On Fri, Nov 11, 2011 at 9:10 PM, Andreas Perstinger
<andreas.perstinger at gmx.net> wrote:
> On 2011-11-11 05:14, lina wrote:
>>
>> def LongestCommonSubstring(S1, S2):
>>     M = [[0]*(1+len(S2)) for i in xrange(1+len(S1))] ## creat 4*5 matrix
>>     longest, x_longest = 0, 0
>>     for x in xrange(1,1+len(S1)):                 ## read each row
>>         for y in xrange(1,1+len(S2)):             ## read each coloumn
>>             if S1[x-1] == S2[y-1]:
>>                 M[x][y] = M[x-1][y-1]+1
>>                 if M[x][y]>  longest:
>>                     longest = M[x][y]
>>                     x_longest = x
>>                 else:
>>                     M[x][y] = 0
>>     return S1[x_longest-longest:x_longest]
>
> That's still not the right version.
>
> If you compare your version to the one at wikibooks (
> http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring#Python
> ), you'll see that the else-branch is wrongly indented (one level too deep).
> It belongs to the first if-comparison:
>
> if S1 ...
>     M[x][y] ...
>     if M[x][y] ...
>        ...
> else: ...

Thanks, I was so careless and most important failure to give a deep
understanding.
>
>
>> if __name__=="__main__":
>>
>>
>>
>>
>> print(LongestCommonSubstring(LongestCommonSubstring(a,a),LongestCommonSubstring(b,b)))
>
>                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ??? What do you try to accomplish here ???
> You call "LongestCommonSubstring" with identical strings, thus the result
> must be the same string.
Yes, the results is the same string,
actually I want to remove the distractions of "," and as you noticed,
the blanks and single quotes between numbers and the square brackets
at the two terminal of the line.
read so many unnecessary and I have a trouble of removing those distractions.

> Why not
>
> print(LongestCommonSubstring(a, b))
>
> as you did the line below with "c" and "d"?
>
> file there is just one very long line which looks like a string
> representation of a list of two-digits strings. This complicates further
> processing because you have to deal with all the unnecessary commas, blanks
> and single quotes between your numbers and the square brackets at the
> beginning and the end of the line.
>
>
>>  \$ python3 LongestCommonSubstring.py
>> 2189
>> ['
>> ['82']
>>
>> The results are wrong.
>> c, d are the string from file atom-pair_4,txt, exactly the same as a,
>> d is the same as b.
>>
>> and even for (c,d) results are not correct, visually we can see some
>> similar groups, not mention the longest groups.
>
> And even if you use the correct function from wikibooks I can anticipate
> another problem :-)
> The implementation from wikibooks just returns the first common substring
> which it finds in the first string:
>
>>>> WikibooksLongestCommonSubstring("ABAB","BABA")
> 'ABA'
>>>> WikibooksLongestCommonSubstring("BABA", "ABAB")
> 'BAB'
>
> If there are more possible substrings with the same length (as in the
> example above) only the first one is returned.
You are right, I did not think of this parts before.

and actually the initiative wish was to find possible paths, I mean,
possible substrings, all possible substrings. not the longest one, but
at least bigger than 3.
>
> But in your example there are at least two different pathways (I've found
> three) which have the same length, as changing the order of the parameters
> will show you:
>
>>>> WikibooksLongestCommonSubstring(c, d)
> ['61', '70', '61']
>>>> WikibooksLongestCommonSubstring(d, c)
> ['83', '61', '83']
>
> Bye, Andreas