[Tutor] Iterating over columns in a tab-delimited text and writing SQL statements

Brett Kelly inkedmn at gmail.com
Sat Nov 6 00:58:25 CET 2004


---------- Forwarded message ----------
From: Brett Kelly <inkedmn at gmail.com>
Date: Fri, 5 Nov 2004 12:48:40 -0800
Subject: Re: [Tutor] Iterating over columns in a tab-delimited text
and writing SQL statements
To: kumar s <ps_python at yahoo.com>


Hacked this together, hoping i understand your problem correctly:

------------------------------------------
data file to be read:

foo     bar     baz     blarg
1       2       3       "stuff"
4       5       6       "morestuff"
-----------------------------------------
python code:

# read data and create sql

sqlbase = """
insert into mytable (%s) values (%s)
"""

fd = file("somedata.txt").readlines()

cols = fd[0].split()  # column names
colstr = ','.join(cols) # formatted string that will plug in nicely
sqlresult = file("somesql.txt", 'w') # output

for r in fd[1:]: # loop through everything after first line
        vals = r.split() # split at whitespace
        valstr = ','.join(vals) # join with commas for sql
        sqlresult.write(sqlbase % (colstr, valstr))  # write to file

sqlresult.close()
print "done"
----------------------------------
result file:

insert into mytable (foo,bar,baz,blarg) values (1,2,3,"stuff")

insert into mytable (foo,bar,baz,blarg) values (4,5,6,"morestuff")
------------------------------------

Is that what you're after?

Brett



On Fri, 5 Nov 2004 08:34:15 -0800 (PST), kumar s <ps_python at yahoo.com> wrote:
> Dear group,
>  I have a file with ~200 Columns and 20,000 rows.
>
> Every column designates a patient sample.  Each such
> patient sample data should be made as an experiment in
> my database. So,
> I have to write SQL insert statements on these.
>
> My data resembles likes this in its simplest form:
>
> Gene Name  probe_name   AL21     AL11   AL12
> xyz         1000_at     272.3   345.2   -32.45
> abc         1001_at     134.4    45.3   342.59
> ...............................................
>
> up to 20K rows.
>
> Now I want to create an sql statement like the
> following:
>
> INSERT into expression (gene_name,probe_name,sample,
> exprs) VALUES ('xyz', '1000_at','AL21',272.3)
>
> import string
> >>> from string import split
> >>> f1 = open('insert_test.txt','r')
> >>> lines = f1.read()
> >>> rows = split(lines,'\n')
> >>> rows
> ['Gene Name\tprobe_name\tAL10\tAL21\tAL23',
> 'xyz\t1000_at\t272.3\t345.2\t-32.45',
> 'abc\t1001_at\t134.4\t45.3\t342.59', '']
> >>> f2 = open('sql.txt','w')
> >>> for i in range(len(lst1)):
>         cols = split(lst1[i],'\t')
>         f2.write('(INSERT INTO
> expr(gene_name,probe_name,sample,expr)
> VALUES('+cols[0]+','+cols[1]+','+sample+','+cols[2]+
> ')''\n')
>
> Traceback (most recent call last):
>   File "<pyshell#63>", line 3, in -toplevel-
>     f2.write('(INSERT INTO
> expr(gene_name,probe_name,sample,expr)
> VALUES('+cols[0]+','+cols[1]+','+sample+','+cols[2]+
> ')''\n')
> IndexError: list index out of range
> >>> f2.close()
>
> Result from sql.txt:
>
> (INSERT INTO expr(gene_name,probe_name,sample,expr)
> VALUES(Gene Name,probe_name,AL21,AL10)
> (INSERT INTO expr(gene_name,probe_name,sample,expr)
> VALUES(xyz,1000_at,AL21,272.3)
> (INSERT INTO expr(gene_name,probe_name,sample,expr)
> VALUES(abc,1001_at,AL21,134.4)
>
> My problem:
>
> I am unable to write a function that will take my
> matrix and write statements for every sample such as
> AL21, AL11,AL12.
>
> I know I am not good at looping over columns here.
> Take column 3 first and column 4 and then column 5
> like that... However
> column 1 and 2 remains same for every sample, because
> they are common for evert sample column.
> Something like:
>
> (INSERT INTO expr(gene_name,probe_name,sample,expr)
> VALUES(Gene Name,probe_name,AL21,AL10)
> (INSERT INTO expr(gene_name,probe_name,sample,expr)
> VALUES(xyz,1000_at,AL21,272.3)
> (INSERT INTO expr(gene_name,probe_name,sample,expr)
> VALUES(abc,1001_at,AL21,134.4)
>
> (INSERT INTO expr(gene_name,probe_name,sample,expr)
> VALUES(xyz,1000_at,AL11,345.2)
> (INSERT INTO expr(gene_name,probe_name,sample,expr)
> VALUES(abc,1001_at,AL11,45.3)
>
> (INSERT INTO expr(gene_name,probe_name,sample,expr)
> VALUES(xyz,1000_at,AL12,-32.5)
> (INSERT INTO expr(gene_name,probe_name,sample,expr)
> VALUES(abc,1001_at,AL12,345.59)
>
> 2. Also why am I getting INdex out of range ,
> IndexError: ? I have no explanation to it.
>
> Can any one please help me.
>
> Thank you.
>
> Kumar.
>
> __________________________________
> Do you Yahoo!?
> Check out the new Yahoo! Front Page.
> www.yahoo.com
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>


--
Brett Kelly
http://inkedmn.com:8000


-- 
Brett Kelly
http://inkedmn.com:8000


More information about the Tutor mailing list