[Tutor] csv DictReader/Writer question

Thu Oct 21 10:24:10 CEST 2010

Ara Kooser wrote:

> Hello all,
> 
>   I have a csv file (from a previous code output). It looks like this:
> Species2, Protein ID, E_value, Length, Hit From, Hit to, Protein ID2,
> Locus Tag, Start/Stop, Species
> Streptomyces sp. AA4,  ZP_05482482,  2.8293600000000001e-140,  5256, 
> 1824,
> 2249\n, ZP_05482482,  StAA4_0101000304844,
> complement(NZ_ACEV01000078.1:25146..40916)4,  Streptomyces sp. AA4: 0\n
> Streptomyces sp. AA4,  ZP_05482482,  8.0333299999999997e-138,  5256,  123,
> 547\n, ZP_05482482,  StAA4_0101000304844,
> complement(NZ_ACEV01000078.1:25146..40916)4,  Streptomyces sp. AA4: 0\n
> Streptomyces sp. AA4,  ZP_05482482,  1.08889e-124,  5256,  3539,  3956\n,
> ZP_05482482,  StAA4_0101000304844,
> complement(NZ_ACEV01000078.1:25146..40916)4,  Streptomyces sp. AA4: 0\n
> ....
> 
> I want to removing certain sections in each line so I wrote this code
> using csv.DictWriter:
> import csv
> data = csv.DictReader(open('strep_aa.txt'))
> 
> for x in data:
>     del x['Species2']
>     del x[' Protein ID2']
>     print x
> 
>   When it prints to the screen everything works great:
> {' Hit From': '  1824', ' Hit to': '  2249\\n', ' Protein ID': '
> ZP_05482482', ' Locus Tag': '  StAA4_0101000304844', ' Start/Stop': '
> complement(NZ_ACEV01000078.1:25146..40916)4', ' Species': '  Streptomyces
> sp. AA4: 0\\n', ' Length': '  5256', ' E_value': '
> 2.8293600000000001e-140'}
> {' Hit From': '  123', ' Hit to': '  547\\n', ' Protein ID': '
> ZP_05482482', ' Locus Tag': '  StAA4_0101000304844', ' Start/Stop': '
> complement(NZ_ACEV01000078.1:25146..40916)4', ' Species': '  Streptomyces
> sp. AA4: 0\\n', ' Length': '  5256', ' E_value': '
> 8.0333299999999997e-138'}
> 
> What I don't know how to do is the export this back out a csv file and
> rearrange each key as a column header so it work look like this:
> Species  Protein ID  E Value  .....
> 
> I thought csv.DictWriter would be the way to go since it writes
> dictionaries to text files. I was wondering how do I go about doing this?
> I don't really understand the syntax.

It's not about syntax, you have to read the docs for the csv module 
carefully to find the information that is relevant for your problem.
For starters you could try to find out what the skipinitialspace and 
extrasaction keyword parameters are supposed to do in the example below:

with open(source, "rb") as instream:
    reader = csv.DictReader(instream, skipinitialspace=True)

    destfieldnames = list(reader.fieldnames)
    destfieldnames.remove("Species2")
    destfieldnames.remove("Protein ID2")

    with open(dest, "wb") as outstream:
        writer = csv.DictWriter(outstream, destfieldnames, 
extrasaction="ignore")
        writer.writer.writerow(destfieldnames)
        writer.writerows(reader)

Can you find the line where I'm writing the headers to the destination file?

Peter