[Tutor] longest common substring
lina
lina.lastname at gmail.com
Sat Nov 12 09:54:46 CET 2011
On Sat, Nov 12, 2011 at 5:49 AM, Andreas Perstinger
<andreas.perstinger at gmx.net> wrote:
> First, just a little rant :-)
> It doesn't help to randomly change some lines or introduce some new concepts
> you don't understand yet and then hope to get the right result. Your chances
> are very small that this will be succesful.
> You should try to understand some basic concepts first and build on them.
> From your postings the last weeks and especially from today I have the
> impression that you still don't understand how fundamental programming
> concepts work: for-loops, differences between data types (strings, lists,
> sets, ...)
> Honestly, have you already read any programming tutorial? (You'll find a big
> list at http://wiki.python.org/moin/BeginnersGuide/NonProgrammers )? At the
> moment it looks like you are just copying some code snippets from different
> places and then you hopelessly try to modify them to suit your needs. IMHO
> the problems you want to solve are a little too big for you right now.
>
> Nevertheless, here are some comments:
Thanks, Those are very valuable comments. Since I read your post till
the following hours, my mind was haunted by what you pointed out.
The reflection went far away. I had/have a VERY BAD HABIT in learning
and doing things.
My father used to say I was the person did not know how to walk, but
started to run.
Later I realized in my life, such as I barely read a manual/usage/map,
but started messing things up.
(I did destory something I newly bought without spending 2 mins
reading the usage, I could not forget because it's expensive, haha
...).
In the past, for difficulty questions I could do pretty not bad, but
for basic concepts or step by step detailed things I failed more than
once.
But also very honestly answering, that I did try to read some books,
one is dive into python, another is learning python the hard way. and
now I have programming python by Mark Lutz, and another python book on
bedside.
The mainly problems was that I felt nothing when I just read for
reading. forget so easily what I read.
( Now I am a little worried, the bad habit I have had will affect me
go far away or build something serious. Sigh ... )
In the past hours, I tried to read the basic concepts, but get lost
(not lost, just mind becomes empty and inactive) in minutes.
Thanks again for your pointing out. I will remind myself in future.
>
>> Based on former advice, I made a correction/modification on the belowba
>> code.
>>
>> 1] the set and subgroup does not work, here I wish to put all the
>> subgroup in a big set, the set like
>
> That's a good idea, but you don't use the set correctly.
>
>> subgroups=[]
>> subgroup=[]
>> def LongestCommonSubstring(S1, S2):
>
> I think it's better to move "subgroups" and "subgroup" into the function.
> (I've noticed that in most of your scripts you are using a lot of global
> variables. IMHO that's not the best programming style. Do you know what
> "global/local variables", "namespace", "scope" mean?)
>
> You are defining "subgroups" as an empty list, but later you want to use it
> as a set. Thus, you should define it as an empty set:
>
> subgroups = set()
>
> You are also defining "subgroup" as an empty list, but later you assign a
> slice of "S1" to it. Since "S1" is a string, the slice is also a string.
> Therefore:
>
> subgroup = ""
>
>> M = [[0]*(1+len(S2)) for i in xrange(1+len(S1))]
>
> Peter told you already why "xrange" doesn't work in Python 3. But instead of
> using an alias like
>
> xrange = range
>
> IMHO it's better to change it in the code directly.
>
>> longest, x_longest = 0, 0
>> for x in xrange(1,1+len(S1)):
>> for y in xrange(1,1+len(S2)):
>> if S1[x-1] == S2[y-1]:
>> M[x][y] = M[x-1][y-1]+1
>> if M[x][y]> longest:
>> longest = M[x][y]
>> x_longest = x
>> if longest>= 3:
>> subgroup=S1[x_longest-longest:x_longest]
>> subgroups=set([subgroup])
>
> Here you overwrite in the first iteration your original empty list
> "subgroups" with the set of the list which contains the string "subgroup" as
> its only element. Do you really understand this line?
> And in all the following iterations you are overwriting this one-element set
> with another one-element set (the next "subgroup").
> If you want to add an element to an existing set instead of replacing it,
> you have to use the "add()"-method for adding an element to a set:
>
> subgroups.add(subgroup)
>
> This will add the string "subgroup" as a new element to the set "subgroups".
>
>> print(subgroups)
>> else:
>> M[x][y] = 0
>>
>> return S1[x_longest-longest:x_longest]
>
> Here you probably want to return the set "subgroups":
>
> return subgroups
I will return to this parts later.
Based on your advice, I updated the code to below one (which is partially work);
#!/usr/bin/python3
import os.path
from collections import Counter
INFILEEXT=".doc"
def CommonSublist(L1, L2):
sublist=[]
sublists=[]
result=[]
M = [[0]*(1+len(L2)) for i in range(1+len(L1))]
longest, x_longest = 0, 0
for x in range(1,1+len(L1)):
for y in range(1,1+len(L2)):
if L1[x-1] == L2[y-1]:
M[x][y] = M[x-1][y-1]+1
if M[x][y] > longest:
longest = M[x][y]
x_longest = x
if longest >= 2:
sublist=L1[x_longest-longest:x_longest]
if sublist not in sublists:
sublists.append(sublist)
else:
M[x][y] = 0
return sublists
if __name__=="__main__":
for i in range(1,11):
for j in range(1,11):
if i != j:
fileone="atom-pair_"+str(i)+".txt"
filetwo="atom-pair_"+str(j)+".txt"
a=open(fileone,"r").readline().strip().split(' ')
b=open(filetwo,"r").readline().strip().split(' ')
print(fileone,filetwo)
print(CommonSublist(a,b))
The output results:
atom-pair_10.txt atom-pair_8.txt
[["'75',", "'64',"], ["'13',", "'64',", "'75',"], ["'64',", "'62',",
"'75',", "'16',"]]
atom-pair_10.txt atom-pair_9.txt
[["'65',", "'46',"], ["'13',", "'75',", "'64',"]]
Please feel free to give me the comments.
Right now I am chocked in how to achieve build a final result,
contains all the sublist (with duplication) in different files
combinations, and later get those sublists concurrence.
Frankly speaking, today I tried to figure out the name space and scope.
The one I tried :
if longest >= 2:
sublist=L1[x_longest-longest:x_longest]
result=result.append(sublist)
if sublist not in sublists:
sublists.append(sublist)
the $ python3 CommonSublists.py
atom-pair_1.txt atom-pair_2.txt
Traceback (most recent call last):
File "CommonSublists.py", line 47, in <module>
print(CommonSublist(a,b))
File "CommonSublists.py", line 24, in CommonSublist
result=result.append(sublist)
AttributeError: 'NoneType' object has no attribute 'append'
in local domain I set the result=[]
I don't know why it complains its NoneType, since the "result" is
nearly the same as "sublists".
>
>
>> 2] I still have trouble in reading files, mainly about not read "" etc.
>
> The problem is that in your data files there is just this big one-line
> string. AFAIK you have produced these data files yourself, haven't you? In
Yes. Those files were generated by myself. I have redone those things.
> that case it would be better to change the way how you save the data (be it
> a well-formatted string or a list or something else) instead of trying to
> fix it here (in this script).
>
> Bye, Andreas
Really thanks,
P.S I attached the whole directory in below link:
tar.gz one
https://docs.google.com/open?id=0B93SVRfpVVg3Y2Q2OWI1N2EtY2VmMi00MTQxLTgyYTctYmM0NDFkNGY1YzIz
zip one:
https://docs.google.com/open?id=0B93SVRfpVVg3ODM3NjQ2ZmEtMzgyMy00ODIxLWIxMTUtMDhmYmU0MGQzOWZj
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list