[Tutor] Iterating over columns in a tab-delimited text and writing
SQL statements
kumar s
ps_python at yahoo.com
Fri Nov 5 17:34:15 CET 2004
Dear group,
I have a file with ~200 Columns and 20,000 rows.
Every column designates a patient sample. Each such
patient sample data should be made as an experiment in
my database. So,
I have to write SQL insert statements on these.
My data resembles likes this in its simplest form:
Gene Name probe_name AL21 AL11 AL12
xyz 1000_at 272.3 345.2 -32.45
abc 1001_at 134.4 45.3 342.59
...............................................
up to 20K rows.
Now I want to create an sql statement like the
following:
INSERT into expression (gene_name,probe_name,sample,
exprs) VALUES ('xyz', '1000_at','AL21',272.3)
import string
>>> from string import split
>>> f1 = open('insert_test.txt','r')
>>> lines = f1.read()
>>> rows = split(lines,'\n')
>>> rows
['Gene Name\tprobe_name\tAL10\tAL21\tAL23',
'xyz\t1000_at\t272.3\t345.2\t-32.45',
'abc\t1001_at\t134.4\t45.3\t342.59', '']
>>> f2 = open('sql.txt','w')
>>> for i in range(len(lst1)):
cols = split(lst1[i],'\t')
f2.write('(INSERT INTO
expr(gene_name,probe_name,sample,expr)
VALUES('+cols[0]+','+cols[1]+','+sample+','+cols[2]+
')''\n')
Traceback (most recent call last):
File "<pyshell#63>", line 3, in -toplevel-
f2.write('(INSERT INTO
expr(gene_name,probe_name,sample,expr)
VALUES('+cols[0]+','+cols[1]+','+sample+','+cols[2]+
')''\n')
IndexError: list index out of range
>>> f2.close()
Result from sql.txt:
(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(Gene Name,probe_name,AL21,AL10)
(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(xyz,1000_at,AL21,272.3)
(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(abc,1001_at,AL21,134.4)
My problem:
I am unable to write a function that will take my
matrix and write statements for every sample such as
AL21, AL11,AL12.
I know I am not good at looping over columns here.
Take column 3 first and column 4 and then column 5
like that... However
column 1 and 2 remains same for every sample, because
they are common for evert sample column.
Something like:
(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(Gene Name,probe_name,AL21,AL10)
(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(xyz,1000_at,AL21,272.3)
(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(abc,1001_at,AL21,134.4)
(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(xyz,1000_at,AL11,345.2)
(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(abc,1001_at,AL11,45.3)
(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(xyz,1000_at,AL12,-32.5)
(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(abc,1001_at,AL12,345.59)
2. Also why am I getting INdex out of range ,
IndexError: ? I have no explanation to it.
Can any one please help me.
Thank you.
Kumar.
__________________________________
Do you Yahoo!?
Check out the new Yahoo! Front Page.
www.yahoo.com
More information about the Tutor
mailing list