[Tutor] Script for Parsing string sequences from a file

Joel Goldstick joel.goldstick at gmail.com
Fri Apr 15 14:57:50 CEST 2011


sorry, I hit send too soon on last message

On Fri, Apr 15, 2011 at 8:54 AM, Joel Goldstick <joel.goldstick at gmail.com>wrote:

>
>
> On Fri, Apr 15, 2011 at 8:41 AM, Spyros Charonis <s.charonis at gmail.com>wrote:
>
>> Hello,
>>
>> I'm doing a biomedical degree and am taking a course on bioinformatics. We
>> were given a raw version of a public database in a file (the file is in
>> simple ASCII) and need to extract only certain lines containing important
>> information. I've made a script that does not work and I am having trouble
>> understanding why.
>>
>> when I run it on the python shell, it prompts for a protein name but then
>> reports that there is no such entry. The first while loop nested inside a
>> for loop is intended to pick up all lines beginning with "gc;", chop off the
>> "gc;" part and keep only the text after that (which is a protein name).
>>  Then it scans the file and collects all lines, chops the "gc;" and stores
>> in them in a tuple. This tuple is not built correctly, because as I posted
>> when the program is run it reports that it cannot find my query in the tuple
>> I created and it is certainly in the database. Can you detect what the
>> mistake is? Thank you in advance!
>>
>> Spyros
>>
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>>
>>
> import os, string
>
> printsdb =
> open('/users/spyros/folder1/python/PRINTSmotifs/prints41_1.kdat', 'r')
> lines = printsdb.readlines()
>
> # find PRINTS name entries
> you need to have a list to collect your strings:
> protnames = []
> for line in lines:   # this gets you each line
>     #while line.startswith('gc;'):  this is wrong
>     if line.startswith('gc;');     # do this instead
>         protnames.append(line.lstrip('gc;'))   # this adds your stripped
> string to the protnames list
>

    # try doing something like:
  print protnames   # this should give you a list of all your lines that
started with 'gc;'
  # this block I don't understand


>     if not protnames:
>             print('error in creating tuple') # check if tuple is true or
> false
>         #print(protnames)
>         break
>
>
Now, you have protnames with all of your protein names
see if above helps.  then you have below to figure out

query = input("search a protein: ")
> query = query.upper()
> if query in protnames:
>     print("\nDisplaying Motifs")
> else:
>     print("\nentry not in database")
>
> # Parse motifs
> def extract_motifs(query):
>     motif_id = ()
>     motif = ()
>     while query in lines:  ####for query, get motif_ids and motifs
>         while line.startswith('ft;'):
>             motif_id = line.lstrip('ft;')
>             motif_ids = (motif_id)
>             #print(motif_id)
>             while line.startswith('fd;'):
>                 motif = line.lstrip('fd;')
>                 motifs = (motif)
>             #print(motif)
>             return motif_id, motif
>
> if __name__ == '__main__':
>     final_motifs = extract_motifs('query')
>
>
>
> --
> Joel Goldstick
>
>


-- 
Joel Goldstick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110415/3c7984e8/attachment-0001.html>


More information about the Tutor mailing list