[Tutor] longest common substring
Andreas Perstinger
andreas.perstinger at gmx.net
Sun Nov 13 23:28:45 CET 2011
On 2011-11-11 14:44, lina wrote:
> You are right, I did not think of this parts before. and actually the
> initiative wish was to find possible paths, I mean, possible
> substrings, all possible substrings. not the longest one, but at
> least bigger than 3.
I had some time today and since you have changed your initial task (from
finding the longest common path to finding all common paths with a
minimum length) I've modified the code and came up with the following
solution:
def AllCommonPaths(list1, list2, minimum=3):
""" finds all common paths with a minimum length (default = 3)"""
# First we have to initialize the necessary variables:
# M is an empty table where we will store all found matches
# (regardless of their length)
M = [[0] * (len(list2)) for i in range(len(list1))]
# length is a dictionary where we store the length of each common
# path. The keys are the starting positions ot the paths in list1.
length = {}
# result will be a list of of all found paths
result =[]
# Now the hard work begins:
# Each element of list1 is compared to each element in list2
# (x is the index for list1, y is the index for list2).
# If we find a match, we store the distance to the starting point
# of the matching block. If we are in the left-most column (x == 0)
# or in the upper-most row (y == 0) we have to set the starting
# point ourself because we would get negative indexes if we look
# for the predecessor cell (M[x - 1][y - 1]). Else, we are one
# element farther away as the element before, so we add 1 to its
# value.
for x in range(len(list1)):
for y in range(len(list2)):
if list1[x] == list2[y]:
if (x == 0) or (y == 0):
M[x][y] = 1
else:
M[x][y] = M[x - 1][y - 1] + 1
# To get everything done in one pass, we update the length of
# the found path in our dictionary if it is longer than the minimum
# length. Thus we don't have to get through the whole table a
# second time to get all found paths with the minimum length (we
# don't know yet if we are already at the end of the matching
# block).
if M[x][y] >= minimum:
length[x + 1 - M[x][y]] = M[x][y]
# We now have for all matching blocks their starting
# position in list1 and their length. Now we cut out this parts
# and create our resulting list
for pos in length:
result.append(list1[pos:pos + length[pos]])
return result
I've tried to explain what I have done, but I'm sure you will still have
questions :-).
Is this close to what you want?
Bye, Andreas
PS: Here's the function again without comments:
def AllCommonPaths(list1, list2, minimum=3):
""" finds all common paths with a minimum length (default = 3)"""
M = [[0] * (len(list2)) for i in range(len(list1))]
length = {}
result =[]
for x in range(len(list1)):
for y in range(len(list2)):
if list1[x] == list2[y]:
if (x == 0) or (y == 0):
M[x][y] = 1
else:
M[x][y] = M[x - 1][y - 1] + 1
if M[x][y] >= minimum:
length[x + 1 - M[x][y]] = M[x][y]
for pos in length:
result.append(list1[pos:pos + length[pos]])
return result
More information about the Tutor
mailing list