processing the genetic code with python?
James Stroud
jstroud at ucla.edu
Mon Mar 6 15:54:23 EST 2006
James Stroud wrote:
> nuttydevil wrote:
>
>> I have many notepad documents that all contain long chunks of genetic
>> code. They look something like this:
>>
>> atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacag
>> tacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaa
>> agcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgt
>> ggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaa
>>
>> Basically, I want to design a program using python that can open and
>> read these documents. However, I want them to be read 3 base pairs at a
>> time (to analyse them codon by codon) and find the value that each
>> codon has a value assigned to it. An example of this is below:
>>
>> ** If the three base pairs were UUU the value assigned to it (from the
>> codon value table) would be 0.296
>>
>> The program has to read all the sequence three pairs at a time, then I
>> want to get all the values for each codon, multiply them together and
>> put them to the power of 1 / the length of the sequence in codons
>> (which is the length of the whole sequence divided by three).
>>
>> However, to make things even more complicated, the notebook sequences
>> are in lowercase and the codon value table is in uppercase, so the
>> sequences need to be converted into uppercase. Also, the Ts in the DNA
>> sequences need to be changed to Us (again to match the codon value
>> table). And finally, before the DNA sequences are read and analysed I
>> need to remove the first 50 codons (i.e. the first 150 letters) and the
>> last 20 codons (the last 60 letters) from the DNA sequence. I've also
>> been having problems ensuring the program reads ALL the sequence 3
>> letters at a time.
>>
>> I've tried various ways of doing this but keep coming unstuck along the
>> way. Has anyone got any suggestions for how they would tackle this
>> problem?
>
>
> Yes: use python.
>
>> Thanks for any help recieved!
>>
>
> I couldn't help myself. I strongly suggest you study this example. It
> will cut your coding time way down in the future.
>
> I'm writing your name down and this is the last time I'm doing homework
> for you.
>
> James
>
>
> from operator import mul
>
> table = { 'AUG' : 0.98999, 'CCC' : 0.9755 } # <== you fill this in
> trim_front = 50
> trim_back = 20
>
> # Why I did this:
> # Python >=1 line per thought; you have to love it
> data = "".join([s.strip() for s in open(filename)])
> data = data.upper().replace('T', 'U')
> codons = [data[i:i+3] for i in xrange(0, len(data), 3)] # Alex Martelli
> trimmed = codons[trim_front:-trim_back]
> product = reduce(mul, [table[codon] for codon in codons])
> value = product**(1.0/len(trimmed)) # <== is this really ALL codons?
>
> print value # useless print statement
>
>
I noticed a typo. Should be "Python <= 1 line per thought".
James
--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095
http://www.jamesstroud.com/
More information about the Python-list
mailing list