[Tutor] FASTA FILE SUB-SEQUENCE EXTRACTION

Tue Mar 8 16:34:13 EST 2016

On 08/03/16 12:19, syed zaidi wrote:

One thing. This is a plain text mailing list and because Python is space
dependant you need to post in plain text not HTML/RTF or the layout gets
lost in the mail system. (as you see below).

The main things you need to tell us are what libraries you are
using to read the FASTA data, which OS, and which Python version.

> Well, fasta is a file format used by biologists to store 
> biological sequencesthe format is as under

> sequence information (sequence name, sequence length etc)genomic sequence> sequence information (sequence name, sequence length etc)genomic sequenceI want to match the name of sequence with another list of sequence names and splice the sequence by the provided list of start and end sites for each sequenceso the pseudo code could beif line starts with '>':    match the header name with sequence name:        if sequence name found:            splice from the given start and end positions of that sequence            the code I have devised so far is:import oswith open('E:/scaftig.sample - Copy.scaftig','r') as f:    header = f.readline()    header = header.rstrip(os.linesep)    sequence = ''  
>    for line in f:        line = line.rstrip('\n')        if line[0] == '>':            header = header[:]            print header                    if line[0] != '>':            sequence+= line     
>         print sequence, len(sequence)I would appreciate if you can helpThanksBest RegardsAli

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos