[Tutor] Analysing genetic code (DNA) using python

Mon Mar 6 16:36:50 CET 2006

I have many notepad documents that all contain long chunks of genetic 
code. They look something like this: 

atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacag 
tacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaa 
agcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgt 
ggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaa 

Basically, I want to design a program using python that can open and 
read these documents. However, I want them to be read 3 base pairs at a 
time (to analyse them codon by codon) and find the value that each 
codon has a value assigned to it. An example of this is below: 

** If the three base pairs were UUU the value assigned to it (from the 
codon value table) would be 0.296 

The program has to read all the sequence three pairs at a time, then I 
want to get all the values for each codon, multiply them together and 
put them to the power of 1 / the length of the sequence in codons 
(which is the length of the whole sequence divided by three). 

However, to make things even more complicated, the notebook sequences 
are in lowercase and the codon value table is in uppercase, so the 
sequences need to be converted into uppercase. Also, the Ts in the DNA 
sequences need to be changed to Us (again to match the codon value 
table). And finally, before the DNA sequences are read and analysed I 
need to remove the first 50 codons (i.e. the first 150 letters) and the 
last 20 codons (the last 60 letters) from the DNA sequence. I've also 
been having problems ensuring the program reads ALL the sequence 3 
letters at a time. 

I've tried various ways of doing this but keep coming unstuck along the 
way. Has anyone got any suggestions for how they would tackle this 
problem? 
Thanks for any help recieved! 

--
View this message in context: http://www.nabble.com/Analysing-genetic-code-%28DNA%29-using-python-t1233856.html#a3263717
Sent from the Python - tutor forum at Nabble.com.