[BangPypers] Parsing data

mahendra N mahendra0203 at gmail.com
Sun Sep 18 20:43:09 CEST 2011


Have you thought of using regular expressions?. It might make ur job easier.

Checkout this link good explaination of reg exps.

Thanks and Regards,
Mahendra Naik

2011/9/18 Gopalakrishnan Subramani <gopalakrishnan.subramani at gmail.com>

> Senthil and Gora Mohanty pointed out whats wrong on the code.
>
> This is alternative option, not its best since I feel  always good to parse
> the file B based on file spec instead of the following approach.
>
>
> file_a_lines = open('FileA.txt').readlines()
> file_b_content = open('FileB.txt').read()
>
> for line in file_a_lines:
>    start_pos =  file_b_content.find(line)
>
>    if start_pos >= 0:
>        end_pos = file_b_content.find(">", start_pos + 1)
>
>        if end_pos > 0:
>            print file_b_content[start_pos:end_pos]
>        else: # to deal with end of the line
>            print file_b_content[start_pos:]
>
>
>
> On Sat, Sep 17, 2011 at 7:49 PM, Senthil Kumaran <senthil at uthcode.com
> >wrote:
>
> > On Fri, Sep 16, 2011 at 11:26:34PM -0500, Ananya Sharma wrote:
> > >
> > > *File A-*
> > > >PSUB.GBD61H402FPT34:0-372
> > >
> > > *File B-*
> > > >PSUB.GBD61H402FPT34:0-372
> > > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> > > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> > > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> > > XXXXXXXXNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> > > NNNNCATTTCCTTGAGTATTAGGCCATTCATGCTGTCAATTTTCTTAACT
> > > ATTTGGAAATCCTAGTTGTACAAGATGGCCTTTTTCCCACCTGTATTTGC
> > > TTGGTCTGTGTACTGTAGTCTGCCTCTGCAAATGTTGTGGGAGGACTAAA
> > > TGTGGCGGGGGTGGGCTGACAG
> >
> > Here is the simplest scenario of your case. In this what do you want
> > to do Ignore XXX...CAG in File-B and print only the >PSUB.?
> >
> > If that is the case, you could iterate over file-b and look for lines
> > starting with > and then put them to a list and then do your
> > operations.
> >
> > In your code:
> >
> > > f1=open('fileA','r')
> > > f2=open('fileB','r')
> > > a=""
> > > b=""
> >
> > > for n in f1:
> > >     while not b.startswith(n):
> > >         b=f2.readline()
> >
> > This loop will break when f2 has line starting with >PSUB.
> >
> > >     if len(a)>0:
> > >              print a
> >
> > Won't have any effect.
> >
> > >     b=""
> >
> > You are resetting b.
> >
> > >     while not b.startswith(">"):
> > >        a=a+f2.readline()+"__"
> > >
> > Won't have any effect.
> >
> > >
> > > Any help would be highly appreciated. Thanks.
> >
> > Do you see why your program is not working when reduced to the
> > simplest case?
> >
> > If you are trying to find entities in B which are in A.
> > Just recreate B so that you remove all the non > starting lines and
> > then compare.
> >
> > --
> > Senthil
> >
> >
> >
> > _______________________________________________
> > BangPypers mailing list
> > BangPypers at python.org
> > http://mail.python.org/mailman/listinfo/bangpypers
> >
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>


More information about the BangPypers mailing list