I am interested in this tutor/hack day.<br><br><div class="gmail_quote">On Wed, May 12, 2010 at 5:21 PM, Glen Jarvis <span dir="ltr"><<a href="mailto:glen@glenjarvis.com">glen@glenjarvis.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div>This email covers two topics (although they can be, but don't have to be, inter-related):</div><div><br></div><div>* A job opening</div><div>* A tutor/hack day to give the computer scientists a real Bioinformatics problem to solve</div>
<div><br></div><div>I put them together as a benefit for those who may be considering a job in this field. You can have a day to work on these types of problems to see if it interests you or bores you to tears..</div><div>
<br></div><div><br></div><div>=== Job Opening ===</div>Some time back, I sent out an email regarding my bioinformatics lab hiring a programmer. I tried to give a feel for what work would be like on a daily basis. And, I tried to set your expectation for pay (less than industry).<div>
<br></div><div>We still have that job opening -- probably because I set your expectation so well :( . </div><div><br></div><div>I was intentionally not involved in the interviewing/hiring process because I wanted to have no appearance of impropriety (as I was also interviewing for a position to move from contractor to full time employee). So, if you weren't hired, I don't really know why.... I intentionally stayed out of that loop to keep as professional as possible. I only know the position is still open.</div>
<div><br></div><div>With that said, my boss is talking about hiring another programmer again for a short term (possibly a year or less). Although, if it works out on both sides, it could turn into a permanent position (as it was for me - I was hired full time). Finding a fit for this position is actually difficult (on both sides).</div>
<div><br></div><div>Sooooooo...... I'm going to stick my neck out and try something new: Working on a small bioinformatics problem in an open source environment.</div><div><br></div><div><br></div><div>=== Tutor/Hack day ===</div>
<div>I've been wanting to get the open source community more involved with some of the problems that we're tackling. Open Source code is *so* much better than code reviewed by only a few eyes. And, this would also give everyone a chance to see what a problem would be like.</div>
<div><br></div><div>There are some *real* bioinformaticians on this list (I don't yet consider myself on that level yet -- although I'm getting there). So, if you're a real bioinformatician, this may be a trivial problem for you. But, if you want to come and help explain things/help others work this out, that'd be cool!</div>
<div><br></div><div>I'd like to get together (on a weekend, possibly) and hack on this problem. I will describe the things that I think you need to know:</div><div><br></div><div>* What is FASTA format (<a href="http://www.ncbi.nlm.nih.gov/blast/fasta.shtml" target="_blank">http://www.ncbi.nlm.nih.gov/blast/fasta.shtml</a>)</div>
<div>* An brief introduction to BioPython (<a href="http://biopython.org/" target="_blank">http://biopython.org/</a>)</div><div>* What is a genome </div><div>* What is a gene</div><div>* What are amino acids (contrasting against DNA data)</div>
<div>* What is a 'percent identity' between genes</div>
<div>* What is a species</div><div>* What is a strain (loosely defined because it seems to be very loose in this problem)</div><div>* The term taxa (plural) and taxon (singular)</div><div>* How can genes vary and still be the same gene</div>
<div>* How errors can exist in different databases</div><div>* An introduction to the JGI (<a href="http://www.jgi.doe.gov/" target="_blank">http://www.jgi.doe.gov/</a>) database</div><div>* An introduction to the UniProt (<a href="http://www.uniprot.org/" target="_blank">http://www.uniprot.org/</a>)</div>
<div><br></div><div><br></div><div>With this introduction, you should have a theoretical understanding of all that you need to solve this problem -- the rest is coding. (That is, if I do my job and explain things well -- and don't fall into pot holes of information that I don't know).... Also, I over simplified things that you don't need to know for this problem (e.g., We won't talk about open reading frames at all or what that means. Since we're already given amino acids, we don't care).</div>
<div><br></div><div>The problem is: </div><div><br></div><div>I will give you a file in FASTA format of the genes for a particular species (let's say: Chlamydophila pneumoniae). That file will contain a list of genes, one after the other, again in FASTA format. The file will have the JGI unique identifiers. However, we also want the UniProt identifier for this same gene.</div>
<div><br></div><div>Now, this should be as simple as: "Take the gene from the JGI database, look-up the same gene in UniProt, record the number, dust off your hands - you're done" -- There are lots of little tedious problems, however, that keep it from being this easy.</div>
<div><br></div><div>For example, if two genes are absolutely identical (they have the same amino acid sequence) except for in a single position, are they actually identical? What if the sequence found was in a strain instead of from the original exact species?</div>
<div><br></div><div>Let me ask another question: If you were to somehow magically sequence your personal entire genome (everything - not just genes) from a cell in your toe and also sequence your entire genome from a cell from your nose, would they be identical? I bet not... I'll explain why. Now, we expect less differences in actual genes (not in other parts of your genome), but even then, there can be some variation... </div>
<div><br></div><div>These are the types of questions/problems that we'll be getting into if you're so interested...</div><div><br></div><div>Who's up for this? We'll get date and time once we have a set of interested people... </div>
<div><br></div><div>You don't have to be interested in this job to be interested in this problem (and/or to do more in bioinformatics).</div><div><br></div><div><br></div><div>Cheers,</div><div><br></div><div><br></div>
<div><br></div><div>Glen</div><div><div>
-- <br>Whatever you can do or imagine, begin it;<br>boldness has beauty, magic, and power in it.<br><br>-- Goethe <br>
</div></div>
<br>_______________________________________________<br>
Baypiggies mailing list<br>
<a href="mailto:Baypiggies@python.org">Baypiggies@python.org</a><br>
To change your subscription options or unsubscribe:<br>
<a href="http://mail.python.org/mailman/listinfo/baypiggies" target="_blank">http://mail.python.org/mailman/listinfo/baypiggies</a><br></blockquote></div><br>