[BangPypers] Parsing data

Ananya Sharma as1438 at msstate.edu
Tue Sep 20 04:20:03 CEST 2011


Thank you everybody for your input. It helped a lot in understanding the
logic i had to implement in the program. I am familiar with the concept of
regular expressions but in this case line breaks would have been an issue.
At the end, I did some more reading and browsed through examples and got the
script to work.

Thanks.

Ananya

On Sun, Sep 18, 2011 at 6:30 PM, Gopalakrishnan Subramani <
gopalakrishnan.subramani at gmail.com> wrote:

> At least I am not great at regular expressions. I agree that regex may
> reduce the number of lines,.
>
>
> On Sun, Sep 18, 2011 at 12:55 PM, mahendra N <mahendra0203 at gmail.com>
> wrote:
>
> > Fgt the link
> >
> >
> http://code.google.com/edu/languages/google-python-class/regular-expressions.html
> >
> > 2011/9/19 mahendra N <mahendra0203 at gmail.com>
> >
> > > Have you thought of using regular expressions?. It might make ur job
> > > easier.
> > >
> > > Checkout this link good explaination of reg exps.
> > >
> > > Thanks and Regards,
> > > Mahendra Naik
> > >
> > >
> > > 2011/9/18 Gopalakrishnan Subramani <gopalakrishnan.subramani at gmail.com
> >
> > >
> > >> Senthil and Gora Mohanty pointed out whats wrong on the code.
> > >>
> > >> This is alternative option, not its best since I feel  always good to
> > >> parse
> > >> the file B based on file spec instead of the following approach.
> > >>
> > >>
> > >> file_a_lines = open('FileA.txt').readlines()
> > >> file_b_content = open('FileB.txt').read()
> > >>
> > >> for line in file_a_lines:
> > >>    start_pos =  file_b_content.find(line)
> > >>
> > >>    if start_pos >= 0:
> > >>        end_pos = file_b_content.find(">", start_pos + 1)
> > >>
> > >>        if end_pos > 0:
> > >>            print file_b_content[start_pos:end_pos]
> > >>        else: # to deal with end of the line
> > >>            print file_b_content[start_pos:]
> > >>
> > >>
> > >>
> > >> On Sat, Sep 17, 2011 at 7:49 PM, Senthil Kumaran <senthil at uthcode.com
> > >> >wrote:
> > >>
> > >> > On Fri, Sep 16, 2011 at 11:26:34PM -0500, Ananya Sharma wrote:
> > >> > >
> > >> > > *File A-*
> > >> > > >PSUB.GBD61H402FPT34:0-372
> > >> > >
> > >> > > *File B-*
> > >> > > >PSUB.GBD61H402FPT34:0-372
> > >> > > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> > >> > > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> > >> > > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> > >> > > XXXXXXXXNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> > >> > > NNNNCATTTCCTTGAGTATTAGGCCATTCATGCTGTCAATTTTCTTAACT
> > >> > > ATTTGGAAATCCTAGTTGTACAAGATGGCCTTTTTCCCACCTGTATTTGC
> > >> > > TTGGTCTGTGTACTGTAGTCTGCCTCTGCAAATGTTGTGGGAGGACTAAA
> > >> > > TGTGGCGGGGGTGGGCTGACAG
> > >> >
> > >> > Here is the simplest scenario of your case. In this what do you want
> > >> > to do Ignore XXX...CAG in File-B and print only the >PSUB.?
> > >> >
> > >> > If that is the case, you could iterate over file-b and look for
> lines
> > >> > starting with > and then put them to a list and then do your
> > >> > operations.
> > >> >
> > >> > In your code:
> > >> >
> > >> > > f1=open('fileA','r')
> > >> > > f2=open('fileB','r')
> > >> > > a=""
> > >> > > b=""
> > >> >
> > >> > > for n in f1:
> > >> > >     while not b.startswith(n):
> > >> > >         b=f2.readline()
> > >> >
> > >> > This loop will break when f2 has line starting with >PSUB.
> > >> >
> > >> > >     if len(a)>0:
> > >> > >              print a
> > >> >
> > >> > Won't have any effect.
> > >> >
> > >> > >     b=""
> > >> >
> > >> > You are resetting b.
> > >> >
> > >> > >     while not b.startswith(">"):
> > >> > >        a=a+f2.readline()+"__"
> > >> > >
> > >> > Won't have any effect.
> > >> >
> > >> > >
> > >> > > Any help would be highly appreciated. Thanks.
> > >> >
> > >> > Do you see why your program is not working when reduced to the
> > >> > simplest case?
> > >> >
> > >> > If you are trying to find entities in B which are in A.
> > >> > Just recreate B so that you remove all the non > starting lines and
> > >> > then compare.
> > >> >
> > >> > --
> > >> > Senthil
> > >> >
> > >> >
> > >> >
> > >> > _______________________________________________
> > >> > BangPypers mailing list
> > >> > BangPypers at python.org
> > >> > http://mail.python.org/mailman/listinfo/bangpypers
> > >> >
> > >> _______________________________________________
> > >> BangPypers mailing list
> > >> BangPypers at python.org
> > >> http://mail.python.org/mailman/listinfo/bangpypers
> > >>
> > >
> > >
> > _______________________________________________
> > BangPypers mailing list
> > BangPypers at python.org
> > http://mail.python.org/mailman/listinfo/bangpypers
> >
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>


More information about the BangPypers mailing list