[Tutor] code efficiency and biological databases

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Thu Apr 24 18:16:02 2003

On Thu, 24 Apr 2003 pan@uchicago.edu wrote:

> Thx Danny for pointing out the rate limiting step in the code I
> presented earlier.

The computer scientist Alan Perlis once quipped: "Lisp programmers know
the value of everything, and the cost of nothing."  Let's make sure that
that generalization doesn't apply so strongly to Python programmers.

> I am heading toward the world of genome/evolution analysis

Very cool!  Yes, biologists often have to deal with enormous databases, so
I think it can be effective to be aware of program efficiency.

The Institute of Genomic Research (TIGR) keeps a respository of many
genomes available on their FTP site; what's sorta neat is that a lot of
their data is in XML.  But what sorta sucks is that a lot of their data is
in XML.  *grin*

If you're ever interested in the model organism 'Arabidopsis Thaliana',
you can check out a concrete example of a medium-sized dataset:


I'm using the 'gzip' and 'pulldom' modules to open and parse out
individual sections of each "Bacterial Artificial Chromosome" at work.
But the library documentation on 'pulldom' is so laughably sparse at the
moment --- I'm thinking of writing a small tutorial on it when I get the

Sorry for being so off topic; I just like talking about my work... *grin*
Talk to you later!