What strategy for random accession of records in massive FASTA file?

Robert Kern rkern at ucsd.edu
Thu Jan 13 19:41:45 EST 2005


Jeff Shannon wrote:

> (Plus, if this format might be used for RNA sequences as well as DNA 
> sequences, you've got at least a fifth base to represent, which means 
> you need at least three bits per base, which means only two bases per 
> byte (or else base-encodings split across byte-boundaries).... That gets 
> ugly real fast.)

Not to mention all the IUPAC symbols for incompletely specified bases 
(e.g. R = A or G).

http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter



More information about the Python-list mailing list