Here is my quick take on it using re<br><br>import re<br>strings = ["1 ALA Helix Sheet Helix Coil", <br> "2 ALA Coil Coil Coil Sheet",<br> "3 ALA Helix Sheet Coil Turn",<br>
"4 ALA Helix Sheet Helix Sheet"]<br> <br>regex = re.compile(r" (.+?\b)(?=.*\1)")<br><br>for s in strings:<br> moreThanOnce = list(set(regex.findall(s)))<br> count = len(moreThanOnce)<br>
if count == 1: print moreThanOnce[0]<br> elif count == 2: print "doubtful"<br> else: print "error"<br><br>Although this is short, its probably not the most efficient.<br>A more verbose and efficient version would be<br>
<br>for s in strings:<br> l = s.split()[2:]<br> counts = {}<br> for ss in l:<br> if counts.has_key(ss): counts[ss] += 1<br> else: counts[ss] = 1<br> filtered = [ss for ss in counts if counts[ss] >= 2]<br> filteredCount = len(filtered)<br>
if filteredCount == 1:<br> print filtered[0]<br> elif filteredCount > 1:<br> print "doubtful"<br> else:<br> print "error"<br><br>HTH <br><br><div class="gmail_quote">On Sat, May 1, 2010 at 9:03 AM, mannu jha <span dir="ltr"><<a href="mailto:mannu_0523@rediffmail.com">mannu_0523@rediffmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Dear all,<br>
<br>
I am trying my problem in this way:<br>
<br>
import re<br>
expr = re.compile("Helix Helix| Sheet Sheet| Turn Turn| Coil Coil")<br>
f = open("CalcSecondary4.txt")<br>
for line in f:<br>
if expr.search(line):<br>
print line<br>
<br>
but with this it is printing only those line in which helix, sheet, turn and coil are coming twice. Kindly suggest how should I modify it so that whatever secondary structure is coming more than or equal to two times it should write that as final secondary structure and if two seconday structure are coming two-two times in one line itself like:<br>
<br>
4 ALA Helix Sheet Helix Sheet<br>
<br>
then it should write that as doubtful and rest it should write as error.<br>
<br>
Thanks,<br>
<br>
<br>
Dear all,<br>
<br>
I have a file like:<br>
<br>
1 ALA Helix Sheet Helix Coil<br>
2 ALA Coil Coil Coil Sheet<br>
3 ALA Helix Sheet Coil Turn<br>
<br>
now what I want is that write a python program in which I will put the condition that in each line whatever secondary structure is coming more than or equal to two times it should write that as final secondary structure and if two seconday structure are coming two-two times in one line itself like:<br>
<br>
4 ALA Helix Sheet Helix Sheet<br>
<br>
then it should write that as doubtful and rest it should write as error.<br>
<br>
Thanks,<br>--<br>
<a href="http://mail.python.org/mailman/listinfo/python-list" target="_blank">http://mail.python.org/mailman/listinfo/python-list</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br>Regards<br>Shashank Singh<br>Senior Undergraduate, Department of Computer Science and Engineering<br>Indian Institute of Technology Bombay<br><a href="mailto:shashank.sunny.singh@gmail.com">shashank.sunny.singh@gmail.com</a><br>
<a href="http://www.cse.iitb.ac.in/~shashanksingh">http://www.cse.iitb.ac.in/~shashanksingh</a><br>