[Tutor] (no subject)

kumar s ps_python at yahoo.com
Thu Jan 7 23:08:15 CET 2010


dear tutors:
I have two files. I want to take coordiates of an row in fileA and find if they are in the range of coordinates in fileB. If they are, I want to be able to map else, pass. 
thanks
kumar

file a:
name     loc          x       y
a	4	40811596	40811620
b	4	40811619	40811643
c	4	40811649	40811673
d	4	40811734	40811758
e	4	40811797	40811821
f	4	40811817	40811841
g	4	40811895	40811919
h	4	40811938	40811962



file b:

                              zx       zy
z1	4	+	40810323	40812000
z2	4	+	40810323	40812000
z3	4	+	40810323	40812000
z4	4	+	40810323	40812000
z5	4	+	40810323	40812000
z6	4	+	40810323	40812000
z7	4	+	40810323	40812000
z8	4	+	40810323	40812000




I want to take coordiates x and y from each row in file a, and check if they are in range of zx and zy. If they are in range then I want to be able to write both matched rows in a tab delim single row. 


my code:

f1 = open('fileA','r')
f2 = open('fileB','r')
da = f1.read().split('\n')
dat = da[:-1]
ba = f2.read().split('\n')
bat = ba[:-1]


for m in dat:
        col = m.split('\t')
        for j in bat:
                cols = j.split('\t')
                if col[1] == cols[1]:
                        xc = int(cols[2])
                        yc = int(cols[3])
                        if int(col[2]) in xrange(xc,yc):
                                if int(col[3]) in xrange(xc,yc):
                                        print m+'\t'+j

output:
a	4	40811596	40811620    z1 4 +  40810323     40812000



This code is too slow. Could you experts help me speed the script a lot faster. 
In each file I have over 50K rows and the script runs very slow. 

Please help. 

thanks
Kumar


      



More information about the Tutor mailing list