From cjm at ava.com.au  Wed Oct 13 16:32:14 2004
From: cjm at ava.com.au (Chris Munchenberg)
Date: Thu, 14 Oct 2004 00:02:14 +0930
Subject: [Csv] PEP 305
Message-ID: <416D3C6E.10305@ava.com.au>

Hi,

I have been using the cvs module extensively, and find it very useful.

Ths only thing is that many of the csv files I use have headers - and 
the Dictionary Reader doesn't cope with them well. I've started using a 
slightly modified version. Feel free to do whatever - use it or delete 
it immediately from your inbox.

Chris Munchenberg.
================================================================

from csv import reader

class MyDictReader:
    def __init__(self, f, fieldnames = None, restkey=None, restval=None,
                 dialect="excel", *args, **kwds):
        f.seek(0)
        self.reader = reader(f, dialect, *args, **kwds)
        self.f = f
        if fieldnames:
            self.fieldnames = fieldnames    # list of keys for the dict
        else:
            self.fieldnames = self.reader.next() # use header row as 
keys for the dictionary
        self.restkey = restkey          # key to catch long rows
        self.restval = restval          # default value for short rows

    def __iter__(self):
        return self

    def next(self):
        row = self.reader.next()
        # unlike the basic reader, we prefer not to return blanks,
        # because we will typically wind up with a dict full of None
        # values
        while row == []:
            row = self.reader.next()
        d = dict(zip(self.fieldnames, row))
        lf = len(self.fieldnames)
        lr = len(row)
        if lf < lr:
            d[self.restkey] = row[lf:]
        elif lf > lr:
            for key in self.fieldnames[lr:]:
                d[key] = self.restval
        return d

    def reset(self):
        self.f.seek(0)
        self.f.readline()


From skip at pobox.com  Sun Oct 17 15:51:39 2004
From: skip at pobox.com (Skip Montanaro)
Date: Sun, 17 Oct 2004 08:51:39 -0500
Subject: [Csv] PEP 305
In-Reply-To: <416D3C6E.10305@ava.com.au>
References: <416D3C6E.10305@ava.com.au>
Message-ID: <16754.30955.440451.414391@montanaro.dyndns.org>


    Chris> Ths only thing is that many of the csv files I use have headers -
    Chris> and the Dictionary Reader doesn't cope with them well. 

In 2.4 this gets better.  If the fieldnames arg to the constructor is None,
it assumes the first line contains column headers.

    % head -3 garageband.csv 
    "Bandname","Date","Venue","Address","City","State","Zip","Comments","gig_url","band_url","song_url","venue_url","spotlight"
    "18 Wheeler","2004-10-01","Greenfield's","3355 S Yarrow Street,","Denver","Colorado","","","http://www.garageband.com/gigs/profile.html?|pe1|X8TaC3TQ6KinZ1WxZA","http://www.garageband.com/artist/18wheeler","","http://www.jambase.com/search.asp?venueID=12774&di","F"
    "18 Wheeler","2004-10-02","Greenfield's","3355 S Yarrow Street","Denver","Colorado","","","http://www.garageband.com/gigs/profile.html?|pe1|X8TaC3TQ6KinZ1Wxaw","http://www.garageband.com/artist/18wheeler","","http://www.jambase.com/search.asp?venueID=12774&di","F"
    % python
    Python 2.4a3 (#47, Sep 11 2004, 13:51:15) 
    [GCC 3.3 20030304 (Apple Computer, Inc. build 1493)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import csv
    >>> rdr = csv.DictReader(open("garageband.csv", "rb"))
    >>> rdr.next()
    {'City': 'Denver', 'band_url': 'http://www.garageband.com/artist/18wheeler', 'Zip': '', 'song_url': '', 'Venue': "Greenfield's", 'Comments': '', 'State': 'Colorado', 'venue_url': 'http://www.jambase.com/search.asp?venueID=12774&di', 'Address': '3355 S Yarrow Street,', 'Date': '2004-10-01', 'Bandname': '18 Wheeler', 'spotlight': 'F', 'gig_url': 'http://www.garageband.com/gigs/profile.html?|pe1|X8TaC3TQ6KinZ1WxZA'}
    >>> rdr.next()
    {'City': 'Denver', 'band_url': 'http://www.garageband.com/artist/18wheeler', 'Zip': '', 'song_url': '', 'Venue': "Greenfield's", 'Comments': '', 'State': 'Colorado', 'venue_url': 'http://www.jambase.com/search.asp?venueID=12774&di', 'Address': '3355 S Yarrow Street', 'Date': '2004-10-02', 'Bandname': '18 Wheeler', 'spotlight': 'F', 'gig_url': 'http://www.garageband.com/gigs/profile.html?|pe1|X8TaC3TQ6KinZ1Wxaw'}
    >>> rdr.fieldnames
    ['Bandname', 'Date', 'Venue', 'Address', 'City', 'State', 'Zip', 'Comments', 'gig_url', 'band_url', 'song_url', 'venue_url', 'spotlight']

If you're going to be using 2.2 or 2.3 for awhile, I suggest the following
idiom:

    % python2.3
    Python 2.3.3 (#1, Apr  4 2004, 10:12:27) 
    [GCC 3.3 20030304 (Apple Computer, Inc. build 1493)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import csv
    >>> f = open("garageband.csv", "rb") 
    >>> rdr = csv.reader(f)
    >>> rdr = csv.DictReader(f, rdr.next())
    >>> rdr.next()
    {'City': 'Denver', 'band_url': 'http://www.garageband.com/artist/18wheeler', 'Zip': '', 'song_url': '', 'Venue': "Greenfield's", 'Comments': '', 'State': 'Colorado', 'venue_url': 'http://www.jambase.com/search.asp?venueID=12774&di', 'Address': '3355 S Yarrow Street,', 'Date': '2004-10-01', 'Bandname': '18 Wheeler', 'spotlight': 'F', 'gig_url': 'http://www.garageband.com/gigs/profile.html?|pe1|X8TaC3TQ6KinZ1WxZA'}
    >>> rdr.fieldnames
    ['Bandname', 'Date', 'Venue', 'Address', 'City', 'State', 'Zip', 'Comments', 'gig_url', 'band_url', 'song_url', 'venue_url', 'spotlight']

For writing you still have to explicitly declare the fieldnames to the
constructor, and write the first row:

    fieldnames = [...]
    writer = csv.DictWriter(f, fieldnames)
    writer.writerow(dict(zip(fieldnames, fieldnames)))

Skip