[BangPypers] Parsing data

Gopalakrishnan Subramani gopalakrishnan.subramani at gmail.com
Sun Sep 18 19:31:51 CEST 2011


Senthil and Gora Mohanty pointed out whats wrong on the code.

This is alternative option, not its best since I feel  always good to parse
the file B based on file spec instead of the following approach.


file_a_lines = open('FileA.txt').readlines()
file_b_content = open('FileB.txt').read()

for line in file_a_lines:
    start_pos =  file_b_content.find(line)

    if start_pos >= 0:
        end_pos = file_b_content.find(">", start_pos + 1)

        if end_pos > 0:
            print file_b_content[start_pos:end_pos]
        else: # to deal with end of the line
            print file_b_content[start_pos:]



On Sat, Sep 17, 2011 at 7:49 PM, Senthil Kumaran <senthil at uthcode.com>wrote:

> On Fri, Sep 16, 2011 at 11:26:34PM -0500, Ananya Sharma wrote:
> >
> > *File A-*
> > >PSUB.GBD61H402FPT34:0-372
> >
> > *File B-*
> > >PSUB.GBD61H402FPT34:0-372
> > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> > XXXXXXXXNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> > NNNNCATTTCCTTGAGTATTAGGCCATTCATGCTGTCAATTTTCTTAACT
> > ATTTGGAAATCCTAGTTGTACAAGATGGCCTTTTTCCCACCTGTATTTGC
> > TTGGTCTGTGTACTGTAGTCTGCCTCTGCAAATGTTGTGGGAGGACTAAA
> > TGTGGCGGGGGTGGGCTGACAG
>
> Here is the simplest scenario of your case. In this what do you want
> to do Ignore XXX...CAG in File-B and print only the >PSUB.?
>
> If that is the case, you could iterate over file-b and look for lines
> starting with > and then put them to a list and then do your
> operations.
>
> In your code:
>
> > f1=open('fileA','r')
> > f2=open('fileB','r')
> > a=""
> > b=""
>
> > for n in f1:
> >     while not b.startswith(n):
> >         b=f2.readline()
>
> This loop will break when f2 has line starting with >PSUB.
>
> >     if len(a)>0:
> >              print a
>
> Won't have any effect.
>
> >     b=""
>
> You are resetting b.
>
> >     while not b.startswith(">"):
> >        a=a+f2.readline()+"__"
> >
> Won't have any effect.
>
> >
> > Any help would be highly appreciated. Thanks.
>
> Do you see why your program is not working when reduced to the
> simplest case?
>
> If you are trying to find entities in B which are in A.
> Just recreate B so that you remove all the non > starting lines and
> then compare.
>
> --
> Senthil
>
>
>
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>


More information about the BangPypers mailing list