[BangPypers] Parsing data

Ananya Sharma as1438 at msstate.edu
Sat Sep 17 06:26:34 CEST 2011


Hey,

I am just a beginner in python. I have to write a script to parse a data
file I have written a script but it is not working. Can anybody please help
me to fix it? Let me explain the task I want to accomplish using this
script.

I have 2 files A and B. File A has ~1450 names which have a corresponding
value in file B. File B also has some extra data which I do not need in my
result (there are ~2700 values in file B). Preview of these files and my
script is given below --

*File A-*
>PSUB.GBD61H402FPT34:0-372
>PSUB.GBD61H401EQ8PG:0-365
>PSUB.GBD61H401AV35C:0-423
>PSUB.GBD61H401EWL7A:0-442
>PSUB.GBD61H401CWYC8:0-284

*File B-*
>PSUB.GBD61H402FPT34:0-372
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNCATTTCCTTGAGTATTAGGCCATTCATGCTGTCAATTTTCTTAACT
ATTTGGAAATCCTAGTTGTACAAGATGGCCTTTTTCCCACCTGTATTTGC
TTGGTCTGTGTACTGTAGTCTGCCTCTGCAAATGTTGTGGGAGGACTAAA
TGTGGCGGGGGTGGGCTGACAG
>PSUB.GBD61H401EQ8PG:0-365
GTTCTGTATTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXGTTGTGTATTAAAAACAATGCA
ATTCTTGGGCAAAGCAGTATGATTGGTTGTGTTTAAGATATCATAGATTC
TGCATACCAGAGCATTTGAGTAAGAAATGCATTTACTAGTAATTATTTTC
ACCCCTTAAAGAAGT
>PSUB.GBD61H401DZU65:0-306
TAAACCATGGAGATCAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXCAAAAGAGGGATATTTAGTTA
GAAAAAAAACCCCACATGGAAGTAATTTTAAACGACCCTGTTGATTTGTT
ATACAG
>PSUB.GBD61H401DLDLT:0-387
ATTTTAAATGTACATCTCATTTAAAGGATTTTTTCCCTAAAGAATTGGAA
ACCGXXXXXXXXXXXXXXX


*Script-*
*
*
f1=open('fileA','r')
f2=open('fileB','r')
a=""
b=""
for n in f1:
    while not b.startswith(n):
        b=f2.readline()
    if len(a)>0:
             print a
    b=""
    while not b.startswith(">"):
       a=a+f2.readline()+"__"


Any help would be highly appreciated. Thanks.

Ananya


More information about the BangPypers mailing list