[Baypiggies] reading files quickly and efficiently

Brent Pedersen bpederse at gmail.com
Wed Nov 17 22:24:32 CET 2010


On Wed, Nov 17, 2010 at 1:13 PM, Glen Jarvis <glen at glenjarvis.com> wrote:
> BioPython also will do all of this for you -- too:
>>>> from Bio import SeqIO
>
>>>> record = SeqIO.read("NC_005816.fna", "fasta")
>>>> record
> SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG',
> SingleLetterAlphabet()), id='gi|45478711|ref|NC_005816.1|',
> name='gi|45478711|ref|NC_005816.1|',
> description='gi|45478711|ref|NC_005816.1| Yersinia pestis biovar Microtus
> ... sequence',
> dbxrefs=[])
>
> You can also look for particular fields (record.id, record.description, and
> record.sequence):
>
> Look at this tutorial:
> http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc16
>
> Cheers,
>
> Glen

i agree with glen that you should use a library. however, that example
is for a single-entry fasta file. if you want random access to a
multi-fasta, use the SeqIO.index in biopython:
http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc56

if you just want an iterator, use SeqIO.parse
http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc11

-brent


More information about the Baypiggies mailing list