[BangPypers] Parsing data
Ananya Sharma
as1438 at msstate.edu
Sat Sep 17 06:26:34 CEST 2011
Hey,
I am just a beginner in python. I have to write a script to parse a data
file I have written a script but it is not working. Can anybody please help
me to fix it? Let me explain the task I want to accomplish using this
script.
I have 2 files A and B. File A has ~1450 names which have a corresponding
value in file B. File B also has some extra data which I do not need in my
result (there are ~2700 values in file B). Preview of these files and my
script is given below --
*File A-*
>PSUB.GBD61H402FPT34:0-372
>PSUB.GBD61H401EQ8PG:0-365
>PSUB.GBD61H401AV35C:0-423
>PSUB.GBD61H401EWL7A:0-442
>PSUB.GBD61H401CWYC8:0-284
*File B-*
>PSUB.GBD61H402FPT34:0-372
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNCATTTCCTTGAGTATTAGGCCATTCATGCTGTCAATTTTCTTAACT
ATTTGGAAATCCTAGTTGTACAAGATGGCCTTTTTCCCACCTGTATTTGC
TTGGTCTGTGTACTGTAGTCTGCCTCTGCAAATGTTGTGGGAGGACTAAA
TGTGGCGGGGGTGGGCTGACAG
>PSUB.GBD61H401EQ8PG:0-365
GTTCTGTATTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXGTTGTGTATTAAAAACAATGCA
ATTCTTGGGCAAAGCAGTATGATTGGTTGTGTTTAAGATATCATAGATTC
TGCATACCAGAGCATTTGAGTAAGAAATGCATTTACTAGTAATTATTTTC
ACCCCTTAAAGAAGT
>PSUB.GBD61H401DZU65:0-306
TAAACCATGGAGATCAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXCAAAAGAGGGATATTTAGTTA
GAAAAAAAACCCCACATGGAAGTAATTTTAAACGACCCTGTTGATTTGTT
ATACAG
>PSUB.GBD61H401DLDLT:0-387
ATTTTAAATGTACATCTCATTTAAAGGATTTTTTCCCTAAAGAATTGGAA
ACCGXXXXXXXXXXXXXXX
*Script-*
*
*
f1=open('fileA','r')
f2=open('fileB','r')
a=""
b=""
for n in f1:
while not b.startswith(n):
b=f2.readline()
if len(a)>0:
print a
b=""
while not b.startswith(">"):
a=a+f2.readline()+"__"
Any help would be highly appreciated. Thanks.
Ananya
More information about the BangPypers
mailing list