[BangPypers] Parsing data
mahendra N
mahendra0203 at gmail.com
Sun Sep 18 20:43:09 CEST 2011
Have you thought of using regular expressions?. It might make ur job easier.
Checkout this link good explaination of reg exps.
Thanks and Regards,
Mahendra Naik
2011/9/18 Gopalakrishnan Subramani <gopalakrishnan.subramani at gmail.com>
> Senthil and Gora Mohanty pointed out whats wrong on the code.
>
> This is alternative option, not its best since I feel always good to parse
> the file B based on file spec instead of the following approach.
>
>
> file_a_lines = open('FileA.txt').readlines()
> file_b_content = open('FileB.txt').read()
>
> for line in file_a_lines:
> start_pos = file_b_content.find(line)
>
> if start_pos >= 0:
> end_pos = file_b_content.find(">", start_pos + 1)
>
> if end_pos > 0:
> print file_b_content[start_pos:end_pos]
> else: # to deal with end of the line
> print file_b_content[start_pos:]
>
>
>
> On Sat, Sep 17, 2011 at 7:49 PM, Senthil Kumaran <senthil at uthcode.com
> >wrote:
>
> > On Fri, Sep 16, 2011 at 11:26:34PM -0500, Ananya Sharma wrote:
> > >
> > > *File A-*
> > > >PSUB.GBD61H402FPT34:0-372
> > >
> > > *File B-*
> > > >PSUB.GBD61H402FPT34:0-372
> > > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> > > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> > > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> > > XXXXXXXXNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> > > NNNNCATTTCCTTGAGTATTAGGCCATTCATGCTGTCAATTTTCTTAACT
> > > ATTTGGAAATCCTAGTTGTACAAGATGGCCTTTTTCCCACCTGTATTTGC
> > > TTGGTCTGTGTACTGTAGTCTGCCTCTGCAAATGTTGTGGGAGGACTAAA
> > > TGTGGCGGGGGTGGGCTGACAG
> >
> > Here is the simplest scenario of your case. In this what do you want
> > to do Ignore XXX...CAG in File-B and print only the >PSUB.?
> >
> > If that is the case, you could iterate over file-b and look for lines
> > starting with > and then put them to a list and then do your
> > operations.
> >
> > In your code:
> >
> > > f1=open('fileA','r')
> > > f2=open('fileB','r')
> > > a=""
> > > b=""
> >
> > > for n in f1:
> > > while not b.startswith(n):
> > > b=f2.readline()
> >
> > This loop will break when f2 has line starting with >PSUB.
> >
> > > if len(a)>0:
> > > print a
> >
> > Won't have any effect.
> >
> > > b=""
> >
> > You are resetting b.
> >
> > > while not b.startswith(">"):
> > > a=a+f2.readline()+"__"
> > >
> > Won't have any effect.
> >
> > >
> > > Any help would be highly appreciated. Thanks.
> >
> > Do you see why your program is not working when reduced to the
> > simplest case?
> >
> > If you are trying to find entities in B which are in A.
> > Just recreate B so that you remove all the non > starting lines and
> > then compare.
> >
> > --
> > Senthil
> >
> >
> >
> > _______________________________________________
> > BangPypers mailing list
> > BangPypers at python.org
> > http://mail.python.org/mailman/listinfo/bangpypers
> >
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>
More information about the BangPypers
mailing list