[Tutor] csv DictReader/Writer question

Fri Oct 22 16:14:40 CEST 2010

Ara Kooser wrote:

>   Thank you for your response. I did try reading the documentation but I
> missed something (or several somethings in this case). So what I see in
> the code you supplied is:
> 
> with open(source, "rb") as instream:
>    reader = csv.DictReader(instream, skipinitialspace=True)
> 
>    destfieldnames = list(reader.fieldnames)
>    destfieldnames.remove("Species2")
>    destfieldnames.remove("Protein ID2")
> 
> So this reads the csv file in but I need to run it to see what
> skipinitialspace does. 

Your csv header looks like

Field1, Field2, Field3, ...

When you feed that to the DictReader you get fieldnames

["Field1", " Field2", " Field3", ...]

skipinitialspace advises the DictReader to remove the leading spaces, i. e. 
you get

["Field1", "Field2", "Field3", ...]

instead.

> Then it reads in the header line and removes the
> Species2 and Protein ID2. Does this remove all the data associated with
> those columns? For some reason I thought I had to bring these into a
> dictionary to manipulate them.

destfieldnames is the list of field names that will be written to the output 
file. I construct it by making a copy of the list of field names of the 
source and then removing the two names of the columns that you don't want in 
the output. Alternatively you could use a constant list like

destfieldnames = ["Field2", "Field5, "Field7"]

to handpick the columns.

>    with open(dest, "wb") as outstream:
>        writer = csv.DictWriter(outstream,
> destfieldnames,extrasaction="ignore")

The following line uses the csv.writer instance wrapped by the 
csv.DictWriter to write the header.

>        writer.writer.writerow(destfieldnames)

The line below iterates over the rows in the source file and writes them 
into the output file.

>        writer.writerows(reader)

A more verbose way to achieve the same thing would be

for row in reader:
    writer.writerow(row)

Remember that row is a dictionary that has items that shall not be copied 
into the output? By default the DictWriter raises an exception if it 
encounters such extra items. But we told it to silently ignore them with 
extrasaction="ignore".

> I think the first line after the open writes the field names to the file
> and the follow lines write the data is that correct? I am going to run the
> code as soon as I get home.

Come back if you have more questions.

Peter