intermediate python csv reader/writer question from a beginner

Nick Craig-Wood nick at craig-wood.com
Tue Feb 24 04:31:53 EST 2009


Learning Python <labmice at gmail.com> wrote:
>  anything related to csv, I usually use VB within excel to manipulate
>  the data, nonetheless, i finally got the courage to take a dive into
>  python.  i have viewed a lot of googled csv tutorials, but none of
>  them address everything i need.  Nonetheless, I was wondering if
>  someone can help me manipulate the sample csv (sample.csv) I have
>  generated:
> 
>  ,,
>  someinfo,,,,,,,
>  somotherinfo,,,,,,,
>  SEQ,Names,Test1,Test2,Date,Time,,
>  1,Adam,1,2,Monday,1:00 PM,,
>  2,Bob,3,4,Monday,1:00 PM,,
>  3,Charlie,5,6,Monday,1:00 PM,,
>  4,Adam,7,8,Monday,2:00 PM,,
>  5,Bob,9,10,Monday,2:00 PM,,
>  6,Charlie,11,12,Monday,2:00 PM,,
>  7,Adam,13,14,Tuesday,1:00 PM,,
>  8,Bob,15,16,Tuesday,1:00 PM,,
>  9,Charlie,17,18,Tuesday,1:00 PM,,
> 
>  into (newfile.csv):
> 
>  Adam-Test1,Adam-Test2,Bob-Test1,Bob-Test2,Charlie-Test1,Charlie-
>  Test2,Date,Time
>  1,2,3,4,5,6,Monday,1:00 PM
>  7,8,9,10,11,12,Monday,2:00 PM
>  13,14,15,16,17,18,Tuesday,1:00 PM
> 
>  note:
>  1. the true header doesn't start line 4 (if this is the case would i
>  have to use "split"?)
>  2. if there were SEQ#10-12, or 13-15, it would still be Adam, Bob,
>  Charlie, but with different Test1/Test2/Date/Time

I'm not really sure what you are trying to calculate, but this should
give you some ideas...

import csv
from collections import defaultdict

reader = csv.reader(open("sample.csv"))
result = defaultdict(list)
for row in reader:
    # ignore unless first row is numeric
    if not row or not row[0].isdigit():
        continue
    n, name, a, b, day, time = row[:6]
    print "n=%r, name=%r, a=%r, b=%r, day=%r, time=%r" % (n, name, a,
    b, day, time)
    result[(day, time)].append(n)

writer = csv.writer(open("newfile.csv", "w"))
for key, values in result.iteritems():
    day, time = key
    values = values + [day, time]
    writer.writerow(values)

This prints

n='1', name='Adam', a='1', b='2', day='Monday', time='1:00 PM'
n='2', name='Bob', a='3', b='4', day='Monday', time='1:00 PM'
n='3', name='Charlie', a='5', b='6', day='Monday', time='1:00 PM'
n='4', name='Adam', a='7', b='8', day='Monday', time='2:00 PM'
n='5', name='Bob', a='9', b='10', day='Monday', time='2:00 PM'
n='6', name='Charlie', a='11', b='12', day='Monday', time='2:00 PM'
n='7', name='Adam', a='13', b='14', day='Tuesday', time='1:00 PM'
n='8', name='Bob', a='15', b='16', day='Tuesday', time='1:00 PM'
n='9', name='Charlie', a='17', b='18', day='Tuesday', time='1:00 PM'

And leaves newfile.csv with the contents

1,2,3,Monday,1:00 PM
7,8,9,Tuesday,1:00 PM
4,5,6,Monday,2:00 PM

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list