[Tutor] Taking FASTA file as an input in Python 3

Mats Wichmann mats at wichmann.us
Sun Oct 20 13:51:33 EDT 2019


On 10/20/19 11:00 AM, Mihir Kharate wrote:
> Hello,
> 
> I want my python program to ask for an input that accepts the FASTA files.
> FASTA files are a type of text files that we use in bioinformatics. The
> first line in a FASTA file is a description about the gene it is encoding.
> The data starts with the second line. An example of the fasta format would
> be:
> 
>> NC_003423.3:c429013-426160 Schizosaccharomyces pombe chromosome II, complete sequence
> ATGGAAAAAATAAAACTTTTAAATGTAAAAACTCCCAATCATTATACTATTATTTTCAAGGTGGTGGCAT
> ACTACAGCGCACTTCAACCTAACCAAAACGAACTACGAAAAGTACGAATGCTTGCTGCTGAAAGTTCTAA
> TGTTAATGGATTATTTAAATCAGTAGTTGCTGTTTTAGATTGTGATGATGAAACGGTACTATTTTGAATT
> ATCAATTGGGTTTGCTGACTTTGTTTACCTAGAAAGAATTGTTCATTAAAAATGACGGGAAAGCTTTGAG
> TTTTCCGTATGACTGGAAGCTGGCAACTCATGTTATATGCGATGACTTTTCCTCTCCTAATGTACAAGAA
> 
> 
> I found the following code online and tried to print it to see whether the
> first line is overread:
> 
>>   DNA_sequence = open ("sequence.fasta" , "r")
>>   DNA_sequence.readline()
>>   print ("DNA_sequence")
> 
> However, this prints the following statement;
>>   <_io.TextIOWrapper name='sequence.fasta' mode='r' encoding='cp1252'>

You cannot have sent us the program you are actually using, because as 
written, the output must be *exactly*

DNA_sequence

If you are printing it without the quote marks, then you will get what 
you have pasted: DNA_sequence is the name associated with the open file 
reference, and that's exactly what it is telling you.

If you want to actually print the data being read from the file, you 
will need to save a reference to it and print that.  Maybe something 
like this?:

with open ("sequence.fasta" , "r") as DNA_sequence:
     DNA_sequence.readline()  # throw away first line
     print ("DNA_sequence")
     for line in DNA_sequence:
         print(line)





More information about the Tutor mailing list