What strategy for random accession of records in massive FASTA file?
Robert Kern
rkern at ucsd.edu
Thu Jan 13 19:41:45 EST 2005
Jeff Shannon wrote:
> (Plus, if this format might be used for RNA sequences as well as DNA
> sequences, you've got at least a fifth base to represent, which means
> you need at least three bits per base, which means only two bases per
> byte (or else base-encodings split across byte-boundaries).... That gets
> ugly real fast.)
Not to mention all the IUPAC symbols for incompletely specified bases
(e.g. R = A or G).
http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html
--
Robert Kern
rkern at ucsd.edu
"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
More information about the Python-list
mailing list