[Tutor] Iterating over columns in a tab-delimited text and
writing SQL statements
Kent Johnson
kent_johnson at skillsoft.com
Sat Nov 6 06:33:21 CET 2004
You need two nested loops to do this. zip() is also very handy, it lets you
iterate two lists at the same time. I think this does what you want:
f1 = open('GeneName.txt', 'r')
header = f1.readline().split()[3:]
for line in f1:
data = line.split()
for sample, expr in zip(header, data[2:]):
sql = '''INSERT into expression (gene_name,probe_name,sample,exprs)
VALUES ('%s', '%s', '%s', %s)''' % (data[0], data[1], sample,
expr)
print sql
For your sample data, it prints
INSERT into expression (gene_name,probe_name,sample,exprs)
VALUES ('xyz', '1000_at', 'AL21', 272.3)
INSERT into expression (gene_name,probe_name,sample,exprs)
VALUES ('xyz', '1000_at', 'AL11', 345.2)
INSERT into expression (gene_name,probe_name,sample,exprs)
VALUES ('xyz', '1000_at', 'AL12', -32.45)
INSERT into expression (gene_name,probe_name,sample,exprs)
VALUES ('abc', '1001_at', 'AL21', 134.4)
INSERT into expression (gene_name,probe_name,sample,exprs)
VALUES ('abc', '1001_at', 'AL11', 45.3)
INSERT into expression (gene_name,probe_name,sample,exprs)
VALUES ('abc', '1001_at', 'AL12', 342.59)
By the way, why not update the database directly instead of generating a
file with ~4,000,000 insert statements? (OK, I can think of one reason - it
may well be faster to execute the inserts in batches rather than one at a
time...)
Kent
At 08:34 AM 11/5/2004 -0800, kumar s wrote:
>Dear group,
> I have a file with ~200 Columns and 20,000 rows.
>
>Every column designates a patient sample. Each such
>patient sample data should be made as an experiment in
>my database. So,
>I have to write SQL insert statements on these.
>
>My data resembles likes this in its simplest form:
>
>Gene Name probe_name AL21 AL11 AL12
>xyz 1000_at 272.3 345.2 -32.45
>abc 1001_at 134.4 45.3 342.59
>...............................................
>
>
>up to 20K rows.
>
>
>Now I want to create an sql statement like the
>following:
>
>INSERT into expression (gene_name,probe_name,sample,
>exprs) VALUES ('xyz', '1000_at','AL21',272.3)
>
>
>
>
>
>import string
> >>> from string import split
> >>> f1 = open('insert_test.txt','r')
> >>> lines = f1.read()
> >>> rows = split(lines,'\n')
> >>> rows
>['Gene Name\tprobe_name\tAL10\tAL21\tAL23',
>'xyz\t1000_at\t272.3\t345.2\t-32.45',
>'abc\t1001_at\t134.4\t45.3\t342.59', '']
> >>> f2 = open('sql.txt','w')
> >>> for i in range(len(lst1)):
> cols = split(lst1[i],'\t')
> f2.write('(INSERT INTO
>expr(gene_name,probe_name,sample,expr)
>VALUES('+cols[0]+','+cols[1]+','+sample+','+cols[2]+
>')''\n')
>
>
>
>Traceback (most recent call last):
> File "<pyshell#63>", line 3, in -toplevel-
> f2.write('(INSERT INTO
>expr(gene_name,probe_name,sample,expr)
>VALUES('+cols[0]+','+cols[1]+','+sample+','+cols[2]+
>')''\n')
>IndexError: list index out of range
> >>> f2.close()
>
>Result from sql.txt:
>
>
>(INSERT INTO expr(gene_name,probe_name,sample,expr)
>VALUES(Gene Name,probe_name,AL21,AL10)
>(INSERT INTO expr(gene_name,probe_name,sample,expr)
>VALUES(xyz,1000_at,AL21,272.3)
>(INSERT INTO expr(gene_name,probe_name,sample,expr)
>VALUES(abc,1001_at,AL21,134.4)
>
>
>
>My problem:
>
>I am unable to write a function that will take my
>matrix and write statements for every sample such as
>AL21, AL11,AL12.
>
>I know I am not good at looping over columns here.
>Take column 3 first and column 4 and then column 5
>like that... However
>column 1 and 2 remains same for every sample, because
>they are common for evert sample column.
>Something like:
>
>(INSERT INTO expr(gene_name,probe_name,sample,expr)
>VALUES(Gene Name,probe_name,AL21,AL10)
>(INSERT INTO expr(gene_name,probe_name,sample,expr)
>VALUES(xyz,1000_at,AL21,272.3)
>(INSERT INTO expr(gene_name,probe_name,sample,expr)
>VALUES(abc,1001_at,AL21,134.4)
>
>
>(INSERT INTO expr(gene_name,probe_name,sample,expr)
>VALUES(xyz,1000_at,AL11,345.2)
>(INSERT INTO expr(gene_name,probe_name,sample,expr)
>VALUES(abc,1001_at,AL11,45.3)
>
>(INSERT INTO expr(gene_name,probe_name,sample,expr)
>VALUES(xyz,1000_at,AL12,-32.5)
>(INSERT INTO expr(gene_name,probe_name,sample,expr)
>VALUES(abc,1001_at,AL12,345.59)
>
>
>
>2. Also why am I getting INdex out of range ,
>IndexError: ? I have no explanation to it.
>
>
>
>
>Can any one please help me.
>
>Thank you.
>
>Kumar.
>
>
>
>
>
>__________________________________
>Do you Yahoo!?
>Check out the new Yahoo! Front Page.
>www.yahoo.com
>
>
>_______________________________________________
>Tutor maillist - Tutor at python.org
>http://mail.python.org/mailman/listinfo/tutor
More information about the Tutor
mailing list