[Baypiggies] Hiring / Bioinformatics Tutor/Hack day

Greg Cheong gregcheong at gmail.com
Thu May 13 20:02:22 CEST 2010


+1

-Greg C.

On Wed, May 12, 2010 at 5:21 PM, Glen Jarvis <glen at glenjarvis.com> wrote:
> This email covers two topics (although they can be, but don't have to be,
> inter-related):
> * A job opening
> * A tutor/hack day to give the computer scientists a real Bioinformatics
> problem to solve
> I put them together as a benefit for those who may be considering a job in
> this field. You can have a day to work on these types of problems to see if
> it interests you or bores you to tears..
>
> === Job Opening ===
> Some time back, I sent out an email regarding my bioinformatics lab hiring a
> programmer. I tried to give a feel for what work would be like on a daily
> basis. And, I tried to set your expectation for pay (less than industry).
> We still have that job opening -- probably because I set your expectation so
> well  :( .
> I was intentionally not involved in the interviewing/hiring process because
> I wanted to have no appearance of impropriety (as I was also interviewing
> for a position to move from contractor to full time employee). So, if you
> weren't hired, I don't really know why.... I intentionally stayed out of
> that loop to keep as professional as possible. I only know the position is
> still open.
> With that said, my boss is talking about hiring another programmer again for
> a short term (possibly a year or less).  Although, if it works out on both
> sides, it could turn into a permanent position (as it was for me - I was
> hired full time). Finding a fit for this position is actually difficult (on
> both sides).
> Sooooooo......  I'm going to stick my neck out and try something new:
> Working on a small bioinformatics problem in an open source environment.
>
> === Tutor/Hack day ===
> I've been wanting to get the open source community more involved with some
> of the problems that we're tackling. Open Source code is *so* much better
> than code reviewed by only a few eyes. And, this would also give everyone a
> chance to see what a problem would be like.
> There are some *real* bioinformaticians on this list (I don't yet consider
> myself on that level yet -- although I'm getting there). So, if you're a
> real bioinformatician, this may be a trivial problem for you. But, if you
> want to come and help explain things/help others work this out, that'd be
> cool!
> I'd like to get together (on a weekend, possibly) and hack on this problem.
> I will describe the things that I think you need to know:
> * What is FASTA format (http://www.ncbi.nlm.nih.gov/blast/fasta.shtml)
> * An brief introduction to BioPython (http://biopython.org/)
> * What is a genome
> * What is a gene
> * What are amino acids (contrasting against DNA data)
> * What is a 'percent identity' between genes
> * What is a species
> * What is a strain (loosely defined because it seems to be very loose in
> this problem)
> * The term taxa (plural) and taxon (singular)
> * How can genes vary and still be the same gene
> * How errors can exist in different databases
> * An introduction to the JGI (http://www.jgi.doe.gov/) database
> * An introduction to the UniProt (http://www.uniprot.org/)
>
> With this introduction, you should have a theoretical understanding of all
> that you need to solve this problem -- the rest is coding. (That is, if I do
> my job and explain things well -- and don't fall into pot holes of
> information that I don't know).... Also, I over simplified things that you
> don't need to know for this problem (e.g., We won't talk about open reading
> frames at all or what that means. Since we're already given amino acids, we
> don't care).
> The problem is:
> I will give you a file in FASTA format of the genes for a particular species
> (let's say: Chlamydophila pneumoniae). That file will contain a list of
> genes, one after the other, again in FASTA format. The file will have the
> JGI unique identifiers. However, we also want the UniProt identifier for
> this same gene.
> Now, this should be as simple as: "Take the gene from the JGI database,
> look-up the same gene in UniProt, record the number, dust off your hands -
> you're done" -- There are lots of little tedious problems, however, that
> keep it from being this easy.
> For example, if two genes are absolutely identical (they have the same amino
> acid sequence) except for in a single position, are they actually identical?
> What if the sequence found was in a strain instead of from the original
> exact species?
> Let me ask another question: If you were to somehow magically sequence your
> personal entire genome (everything - not just genes) from a cell in your toe
> and also sequence your entire genome from a cell from your nose, would they
> be identical?  I bet not... I'll explain why. Now, we expect less
> differences in actual genes (not in other parts of your genome), but even
> then, there can be some variation...
> These are the types of questions/problems that we'll be getting into if
> you're so interested...
> Who's up for this?  We'll get date and time once we have a set of interested
> people...
> You don't have to be interested in this job to be interested in this problem
> (and/or to do more in bioinformatics).
>
> Cheers,
>
>
> Glen
> --
> Whatever you can do or imagine, begin it;
> boldness has beauty, magic, and power in it.
>
> -- Goethe
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>


More information about the Baypiggies mailing list