thought a bit more about this and i have a further comment to make. The genome data files i am working with are representing a SNP in the A/T or A/G or C/G, etc. form. Makes more sense *not* to convert a T/C to Y because eventually i really want the amino acid and i would finally want AT/CG to be converted into ATG and ACG and convert ATG and ACG into the two corresponding amino acids using the genetic code. <br>
<br><br><div class="gmail_quote">On Fri, Aug 6, 2010 at 12:18 PM, Vikram K <span dir="ltr"><<a href="mailto:kpguy1975@gmail.com">kpguy1975@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="im">Because if you represent a SNP with a single character you would not be able to distinguish between homozygous and heterozygous SNPs. <br><br></div><div><div></div><div class="h5"><div class="gmail_quote">
On Fri, Aug 6, 2010 at 11:14 AM, Glen Jarvis <span dir="ltr"><<a href="mailto:glen@glenjarvis.com" target="_blank">glen@glenjarvis.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Vikram,<div><br><div> Thank you for this. I really appreciate it. I didn't catch that this was SNP data. And, I do see snip data represented as, for example, 'C/G'. </div>
<div><br></div><div> I see the IUPAC extended genetic alphabet used when we are uncertain in DNA modeling. For example, if I was to see 'ACGRC', I would know that this means either 'ACGGC' or 'ACGAC'. Since R has this built-in meaning, according to this extended genetic alphabet:</div>
<div><br></div><div><font face="'courier new', monospace">Symbol Meaning</font></div><div><font face="'courier new', monospace">G G</font></div><div>
<font face="'courier new', monospace">A A</font></div><div><font face="'courier new', monospace">T T</font></div><div><font face="'courier new', monospace">C C</font></div>
<div><font face="'courier new', monospace">R G or A</font></div><div><font face="'courier new', monospace">Y T or C</font></div><div><font face="'courier new', monospace">M A or C</font></div>
<div><font face="'courier new', monospace">K G or T</font></div><div><font face="'courier new', monospace">S G or C</font></div><div><font face="'courier new', monospace">W A or T</font></div>
<div><font face="'courier new', monospace">H A, C or T</font></div><div><font face="'courier new', monospace">B G, T or C</font></div><div><font face="'courier new', monospace">V G, C or A</font></div>
<div><font face="'courier new', monospace">D G, A or T</font></div><div><font face="'courier new', monospace">N any of the four bases</font></div>
<div><br></div><div> With that said, I don't see SNP data represented this way. I don't yet undertand why not. I don't see how this symbology can be used for data that I see in FASTA files, but not in SNPs. </div>
<div><br></div><div> From a computer science perspective, it makes much more sense to me to store this in an 'already tokenized form where the tokens are easy to parse' (that is, each letter in a string representing a token already).</div>
<div><br></div><div> Using your previous example, you want to represent 'A/G' as a single character. That is 'R' as defined by the IUPAC extended genetic alphabet. Thus, string 'ARG' maps to codon AUG(mRNA) and thus to a Methionine residue.</div>
<div><br></div><div> For what it's worth, I work at a phylogenomic lab at UC Berkeley and we deal with proteins instead of DNA. So, I may be missing something. Our mapping (our amino acid symbol of X for any amino acid, or B for either Asparagine or aspartic acid, for example) is similar. I totally don't get how to read SNPs, so, I admit, I could be missing something big.</div>
<div><br></div><div> I'm very curious why the IUPAC extended genetic alphabet is not applicable to the relevant portions of SNPs. Do you know why this is the case? Could you explain what I'm missing on why this couldn't be represented this way?</div>
<div><br></div><div><br></div><div>Cheers,</div><div><br></div><font color="#888888"><div><br></div><div>Glen</div></font><div><div></div><div><div> </div><br></div></div></div></blockquote></div></div></div></blockquote>
</div><br>