[Baypiggies] string to list question

Vikram K kpguy1975 at gmail.com
Fri Aug 6 06:50:58 CEST 2010


Hi Glen,
thanks for your response. I am afraid i did not present the problem with
clarity in my original query. The more generalized query is what if:

z = 'ATC/GACTGAGC/TAG'

and  i want
zlist = ['ATC/G','ACT','GAG','C/TAG']

The biology behind this is not as what you have understood. Here is the
problem for you and others interested (i am simplying this as much as i can
since i dont know your biological background):
'C/G' and 'C/T' are SNPs (single nucleotide polymorphisms, which can be
thought of simply as 'change') in a particular genome being studied when
compared to the NCBI reference genome. A specific nucleotide (say 'A') is
being represented by two alternative nucleotides (say 'A/G') in the genome
being investigated. The alternative nucleotides could occur because at that
position there is a difference in the coding and complementary DNA strands
(think of this as a difference between the paternal and maternal DNA strands
at that position).

When i take the exon regions of a gene (that are making proteins)  in the
genome being studied i need to break up the dna string corresponding to the
exon region in groups of  three to get the codons and then find the
corresponding amino acid sequence using the genetic code. In doing this i
want something like 'A/G' to be taken as a single character. ['AT/CG'] will
be then correspond to two alternative amino acids corresponding to ATC and
ATG. [ATG (DNA) corresponds to AUG(mRNA). ]

On Thu, Aug 5, 2010 at 9:31 PM, Glen Jarvis <glen at glenjarvis.com> wrote:

> Vikram,
>
>     I recognize this domain in many of the questions that have been asked.
> There are several times where I've thought, "That *so* isn't the most ideal
> 'Computer Science' way to do something." But, I also recognize that,
> especially in the Biological world, we have no control how we receive the
> data and thus, we still have to solve problems like those reviewed.
>
>    So, I normally don't challenge the base assumption in the question
> because I know from experience, we don't always get the most ideal inputs to
> work with. HOWEVER, I do want to challenge this one because I know there's a
> standard way that this is represented in the Biological community without
> using three characters for a single base. I recognize your original question
> of z = 'AT/CG' to mean, In Biological terms, that:
>
> "Zee equals the string of three nucleotide bases. The first base is
> Adenine. The second base is either Thymine or Cytosine. The third base is
> Guanine."
>
> There's a *much* better (and commonly accepted) way to represent this.
>
> The way this is traditionally is represented is with the extended
> genetic alphabet (
> http://www.hrbc-genomics.net/training/bcd/Curric/PrwAli/node7.html). In
> this case, the middle base would be represented by the letter Y as that
> means either Thymine or Cytosine.
>
> I feel it's much better to represent this as:
>
> z = 'AYG'
>
> Then, the string will work without any expected manipulations. I would
> always work with the alphabet and not put the three character string back in
> as this alphabet is defined and accepted in the community. However, if one
> wanted to they still could later represent this in a 'lookup dictionary'
> such as follows if the output ever needed to be in a the format in question.
>
> lookup = {'R': 'G/A',
>               'Y': 'T/C',
>               'M': 'A/C',....}
>
> Cheers,
>
>
> Glen
>
>
> On Wed, Aug 4, 2010 at 9:37 PM, Vikram K <kpguy1975 at gmail.com> wrote:
>
>> Suppose i have this string:
>> z = 'AT/CG'
>>
>> How do i get this list:
>>
>> zlist = ['A','T/C','G']
>>
>>
>> _______________________________________________
>> Baypiggies mailing list
>> Baypiggies at python.org
>> To change your subscription options or unsubscribe:
>> http://mail.python.org/mailman/listinfo/baypiggies
>>
>
>
>
> --
> Whatever you can do or imagine, begin it;
> boldness has beauty, magic, and power in it.
>
> -- Goethe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20100806/15901f70/attachment.html>


More information about the Baypiggies mailing list