processing the genetic code with python?

Mon Mar 6 15:54:23 EST 2006

James Stroud wrote:
> nuttydevil wrote:
> 
>> I have many notepad documents that all contain long chunks of genetic
>> code. They look something like this:
>>
>> atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacag
>> tacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaa
>> agcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgt
>> ggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaa
>>
>> Basically, I want to design a program using python that can open and
>> read these documents. However, I want them to be read 3 base pairs at a
>> time (to analyse them codon by codon) and find the value that each
>> codon has a value assigned to it. An example of this is below:
>>
>> ** If the three base pairs were UUU the value assigned to it (from the
>> codon value table) would be 0.296
>>
>> The program has to read all the sequence three pairs at a time, then I
>> want to get all the values for each codon, multiply them together and
>> put them to the power of 1 / the length of the sequence in codons
>> (which is the length of the whole sequence divided by three).
>>
>> However, to make things even more complicated, the notebook sequences
>> are in lowercase and the codon value table is in uppercase, so the
>> sequences need to be converted into uppercase. Also, the Ts in the DNA
>> sequences need to be changed to Us (again to match the codon value
>> table). And finally, before the DNA sequences are read and analysed I
>> need to remove the first 50 codons (i.e. the first 150 letters) and the
>> last 20 codons (the last 60 letters) from the DNA sequence. I've also
>> been having problems ensuring the program reads ALL the sequence 3
>> letters at a time.
>>
>> I've tried various ways of doing this but keep coming unstuck along the
>> way. Has anyone got any suggestions for how they would tackle this
>> problem?
> 
> 
> Yes: use python.
> 
>> Thanks for any help recieved!
>>
> 
> I couldn't help myself. I strongly suggest you study this example. It 
> will cut your coding time way down in the future.
> 
> I'm writing your name down and this is the last time I'm doing homework 
> for you.
> 
> James
> 
> 
> from operator import mul
> 
> table = { 'AUG' : 0.98999, 'CCC' : 0.9755 } # <== you fill this in
> trim_front = 50
> trim_back = 20
> 
> # Why I did this:
> # Python >=1 line per thought; you have to love it
> data = "".join([s.strip() for s in open(filename)])
> data = data.upper().replace('T', 'U')
> codons = [data[i:i+3] for i in xrange(0, len(data), 3)]  # Alex Martelli
> trimmed = codons[trim_front:-trim_back]
> product = reduce(mul, [table[codon] for codon in codons])
> value = product**(1.0/len(trimmed))  # <== is this really ALL codons?
> 
> print value       # useless print statement
> 
> 

I noticed a typo. Should be "Python <= 1 line per thought".

James

-- 
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/