Adding a column in a tab delimited txt file

Thu Aug 21 22:41:19 EDT 2003

>>>>> "Garry" == Garry  <gcf78 at hotmail.com> writes:

    Garry> Hi, I am new to python, hope someone can help me here: I
    Garry> have a MS Access exported .txt file which is tab delimited
    Garry> in total 20 columns, now I need to add another column of
    Garry> zero at the 4th column position and a column of zero at the
    Garry> 9th column position. What is the best way to do this? Can I
    Garry> write a while loop to count the number of tab I hit until
    Garry> the counter is 4 and then add a zero in between and thru
    Garry> the whole file?

Unless the file is terribly large, it will be easier to slurp the
whole thing into memory, manipulate some list structures, and then
dump back to the file.

There are a couple of nifty things to speed you along.  You can use
string split methods to split the file on tabs and read the file into
a list of rows, each row split on the tabs.

  rows = [line.split('\t') for line in file('tabdelim.dat')]

The next fun trick is to use the zip(*rows) to tranpose this into a
list of columns.  You can then use the list insert method to insert
your column.  Here I'm adding a last name column to the third column.

  cols = zip(*rows)     # transposes 2Dlist
  cols.insert(2, ['Hunter', 'Sierig', 'Hunter', 'Hunter'])

Now all that is left is to transpose back to rows and write the new
file using the string method join to rejoin the columns with tabs

  rows = zip(*cols)  # transpose back
  file('newfile.dat', 'w').writelines(['\t'.join(row) for row in rows])

This script takes an input file like

  1	John	35	M
  2	Miriam	31	F
  3	Rahel	5	F
  4	Ava	2	F

and generates an outfile

  1	John	Hunter	35	M
  2	Miriam	Sierig	31	F
  3	Rahel	Hunter	5	F
  4	Ava	Hunter	2	F

Damn cool!

Here is the whole script:

    rows = [line.split('\t') for line in file('tabdelim.dat')]
    cols = zip(*rows)  
    cols.insert(2, ['Hunter', 'Sierig', 'Hunter', 'Hunter'])
    rows = zip(*cols)  
    file('newfile.dat', 'w').writelines(['\t'.join(row) for row in rows])

Cheers,
John Hunter