[Tutor] Code optmisation

yogi byogi at yahoo.com
Sat Apr 5 07:05:05 CEST 2008


Hi ,
       Here is my first usable Python code.
The code works. 
Here is what I'm trying to do.
I have two huge text files. After some processing, One is 12M  (file A) and the other 1M (file B) .
The files have columns which are of interest to me.

I 'm trying to match entries of column [0] on file A and B
If it is true proceed to find entries (Rows in file A)  in range provided  by columns  [1]  [2] and  [3]  [4] in file B.
Column [1] and [3] define the lower bounds of the range
Column [3] and [4] define the  upper bounds of the range
I also have put a variation of value  so  that  I can  lookup  +/-  var. 



 #/bin/python
import sys, os, csv, re
x =  0                                  #Define Zero for now
var = 1000000                           #Taking the variation 
# This programme finds the SNPs from the range passed 
# csv splits columns and this file is tab spaced
fis = csv.reader(open("divs.map", "rb"), delimiter='\t', quoting=csv.QUOTE_NONE)
for row in fis:
# csv splits columns and this file is ","  spaced
        gvalues = csv.reader(open("genvalues", "rb"), delimiter=',', quoting=csv.QUOTE_NONE) 
        for gvalue in gvalues:
# To see  Columns (chr) Match 
                if row[0] == gvalue[0]:
# If  Column 3  (range) is Zero  print row
                        if int(gvalue[3]) ==  x:
                                a = int(gvalue[1]) - var
                                b = int(gvalue[2]) + var + 1
                                if int(a <= int(row[3]) <= b):
                                        print   row
# If  Column 3  (range) is not zero find matches and print row
                        else:
                                a = int(gvalue[1]) - var
                                b = int(gvalue[2]) + var + 1
                                if int(a <= int(row[3]) <= b):
                                        print row
                                        c = int(gvalue[3]) - var
                                        d = int(gvalue[4]) + var + 1
                                        if int(c <= int(row[3]) <= d):
                                                print   row

-----------------------------------------------------

Question1 : Is there a better way ?
Question2 : For now I'm using shells time  call  for calculating time required. Does Python provide a more fine grained check.
Question 2: If I have convert this code into a function.
Should I ?

def parse():
    ...
    ...
    ...
    ...


parse ()


      ____________________________________________________________________________________
You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost.  
http://tc.deals.yahoo.com/tc/blockbuster/text5.com



More information about the Tutor mailing list