[Tutor] code efficiency and biological databases

Oleksandr Moskalenko malex@tagancha.org
Fri Apr 25 14:24:01 2003


* Danny Yoo <dyoo@hkn.eecs.berkeley.edu> [2003-04-24 15:15:53 -0700]:

> On Thu, 24 Apr 2003 pan@uchicago.edu wrote:
> > Thx Danny for pointing out the rate limiting step in the code I
> > presented earlier.
> The computer scientist Alan Perlis once quipped: "Lisp programmers know
> the value of everything, and the cost of nothing."  Let's make sure that
> that generalization doesn't apply so strongly to Python programmers.
> *grin*
> > I am heading toward the world of genome/evolution analysis
> Very cool!  Yes, biologists often have to deal with enormous databases, so
> I think it can be effective to be aware of program efficiency.
> The Institute of Genomic Research (TIGR) keeps a respository of many
> genomes available on their FTP site; what's sorta neat is that a lot of
> their data is in XML.  But what sorta sucks is that a lot of their data is
> in XML.  *grin*
> If you're ever interested in the model organism 'Arabidopsis Thaliana',
> you can check out a concrete example of a medium-sized dataset:
>     ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/BACS/
> I'm using the 'gzip' and 'pulldom' modules to open and parse out
> individual sections of each "Bacterial Artificial Chromosome" at work.
> But the library documentation on 'pulldom' is so laughably sparse at the
> moment --- I'm thinking of writing a small tutorial on it when I get the
> chance.

This would be a great tutorial to write! You have a supporting vote from

> Sorry for being so off topic; I just like talking about my work... *grin*
> Talk to you later!


The lyf so short, the craft so long to lerne.
                                   -- Chaucer