[Tutor] (no subject)
kumar s
ps_python at yahoo.com
Thu Jan 7 23:08:15 CET 2010
dear tutors:
I have two files. I want to take coordiates of an row in fileA and find if they are in the range of coordinates in fileB. If they are, I want to be able to map else, pass.
thanks
kumar
file a:
name loc x y
a 4 40811596 40811620
b 4 40811619 40811643
c 4 40811649 40811673
d 4 40811734 40811758
e 4 40811797 40811821
f 4 40811817 40811841
g 4 40811895 40811919
h 4 40811938 40811962
file b:
zx zy
z1 4 + 40810323 40812000
z2 4 + 40810323 40812000
z3 4 + 40810323 40812000
z4 4 + 40810323 40812000
z5 4 + 40810323 40812000
z6 4 + 40810323 40812000
z7 4 + 40810323 40812000
z8 4 + 40810323 40812000
I want to take coordiates x and y from each row in file a, and check if they are in range of zx and zy. If they are in range then I want to be able to write both matched rows in a tab delim single row.
my code:
f1 = open('fileA','r')
f2 = open('fileB','r')
da = f1.read().split('\n')
dat = da[:-1]
ba = f2.read().split('\n')
bat = ba[:-1]
for m in dat:
col = m.split('\t')
for j in bat:
cols = j.split('\t')
if col[1] == cols[1]:
xc = int(cols[2])
yc = int(cols[3])
if int(col[2]) in xrange(xc,yc):
if int(col[3]) in xrange(xc,yc):
print m+'\t'+j
output:
a 4 40811596 40811620 z1 4 + 40810323 40812000
This code is too slow. Could you experts help me speed the script a lot faster.
In each file I have over 50K rows and the script runs very slow.
Please help.
thanks
Kumar
More information about the Tutor
mailing list