[Tutor] Script for Parsing string sequences from a file
Joel Goldstick
joel.goldstick at gmail.com
Fri Apr 15 14:57:50 CEST 2011
sorry, I hit send too soon on last message
On Fri, Apr 15, 2011 at 8:54 AM, Joel Goldstick <joel.goldstick at gmail.com>wrote:
>
>
> On Fri, Apr 15, 2011 at 8:41 AM, Spyros Charonis <s.charonis at gmail.com>wrote:
>
>> Hello,
>>
>> I'm doing a biomedical degree and am taking a course on bioinformatics. We
>> were given a raw version of a public database in a file (the file is in
>> simple ASCII) and need to extract only certain lines containing important
>> information. I've made a script that does not work and I am having trouble
>> understanding why.
>>
>> when I run it on the python shell, it prompts for a protein name but then
>> reports that there is no such entry. The first while loop nested inside a
>> for loop is intended to pick up all lines beginning with "gc;", chop off the
>> "gc;" part and keep only the text after that (which is a protein name).
>> Then it scans the file and collects all lines, chops the "gc;" and stores
>> in them in a tuple. This tuple is not built correctly, because as I posted
>> when the program is run it reports that it cannot find my query in the tuple
>> I created and it is certainly in the database. Can you detect what the
>> mistake is? Thank you in advance!
>>
>> Spyros
>>
>> _______________________________________________
>> Tutor maillist - Tutor at python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>>
>>
> import os, string
>
> printsdb =
> open('/users/spyros/folder1/python/PRINTSmotifs/prints41_1.kdat', 'r')
> lines = printsdb.readlines()
>
> # find PRINTS name entries
> you need to have a list to collect your strings:
> protnames = []
> for line in lines: # this gets you each line
> #while line.startswith('gc;'): this is wrong
> if line.startswith('gc;'); # do this instead
> protnames.append(line.lstrip('gc;')) # this adds your stripped
> string to the protnames list
>
# try doing something like:
print protnames # this should give you a list of all your lines that
started with 'gc;'
# this block I don't understand
> if not protnames:
> print('error in creating tuple') # check if tuple is true or
> false
> #print(protnames)
> break
>
>
Now, you have protnames with all of your protein names
see if above helps. then you have below to figure out
query = input("search a protein: ")
> query = query.upper()
> if query in protnames:
> print("\nDisplaying Motifs")
> else:
> print("\nentry not in database")
>
> # Parse motifs
> def extract_motifs(query):
> motif_id = ()
> motif = ()
> while query in lines: ####for query, get motif_ids and motifs
> while line.startswith('ft;'):
> motif_id = line.lstrip('ft;')
> motif_ids = (motif_id)
> #print(motif_id)
> while line.startswith('fd;'):
> motif = line.lstrip('fd;')
> motifs = (motif)
> #print(motif)
> return motif_id, motif
>
> if __name__ == '__main__':
> final_motifs = extract_motifs('query')
>
>
>
> --
> Joel Goldstick
>
>
--
Joel Goldstick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110415/3c7984e8/attachment-0001.html>
More information about the Tutor
mailing list