[Tutor] creating a tab delim file

Tue Apr 18 23:55:18 CEST 2006

On 18 Apr 2006, srini_iyyer_bio at yahoo.com wrote:

> The problem: 
>
> I have 50 tab delim files. Each file has 500 rows and
> 50 columns.
>
> I have to read the first column of each file. Repeat
> the same for 50 files and write a tab delim text file 
> containing 500 rows and 50 columns. 
>
> code that works through half of the problem:
>
> import glob
>
> files = glob.glob('*.proc')
>
>
> for each in files:
>       f = open(each,'r')
>       da = f.read().split('\n')
>       dat = da[:-1]
>       for m in dat:
>             mycol = m.split('\t')[0] 
>             ..................

You don't need to read the whole file at once. You can read individual
lines from a file with:
      f = open('name')
      for line in f:
          # do something with line

I'll show you a different solution for your problem; if you don't
understand it ask (I try to explain it).

--8<---------------cut here---------------start------------->8---
import glob

filehdls = [file(f) for f in glob.glob('*.proc')]
out = open('reordered.prc', 'w')

col_1 = [f.readline().split('\t')[0] for f in filehdls]
while col_1[0]:
    out.write('\t'.join(col0))
    out.write('\n')
    col_1 = [f.readline().split('\t')[0] for f in filehdls]

out.close()
for f in filehdls: f.close()

--8<---------------cut here---------------end--------------->8---

filehdls is a list of file handles. 
col_1 is a list of the values of column 1 of each of the files.
How does it work?
    f.readline().split('\t')[0]
Read it from left to right. 
First we call readline() which reads the next line from file or returns
the empty string if it reached the end of the file.
Then we call split('\t') on the string returned from readline().  This
returns a list of strings obtained from splitting the string at each
tab.
Then we take the first element from thew list (index 0) since we are
only interested in column 1.
We do this for every file in the list of file handles.

The while loop runs as long as the first element in our list of
columns is not false (at eof we get here an empty string which counts as
false).  We join the columns with a tab, write that string to our output
file and write a newline to that file.  Then we try to read the next
line.

The above will only work if all files have equal length.

   Karl
-- 
Please do *not* send copies of replies to me.
I read the list