From cjm at ava.com.au Wed Oct 13 16:32:14 2004 From: cjm at ava.com.au (Chris Munchenberg) Date: Thu, 14 Oct 2004 00:02:14 +0930 Subject: [Csv] PEP 305 Message-ID: <416D3C6E.10305@ava.com.au> Hi, I have been using the cvs module extensively, and find it very useful. Ths only thing is that many of the csv files I use have headers - and the Dictionary Reader doesn't cope with them well. I've started using a slightly modified version. Feel free to do whatever - use it or delete it immediately from your inbox. Chris Munchenberg. ================================================================ from csv import reader class MyDictReader: def __init__(self, f, fieldnames = None, restkey=None, restval=None, dialect="excel", *args, **kwds): f.seek(0) self.reader = reader(f, dialect, *args, **kwds) self.f = f if fieldnames: self.fieldnames = fieldnames # list of keys for the dict else: self.fieldnames = self.reader.next() # use header row as keys for the dictionary self.restkey = restkey # key to catch long rows self.restval = restval # default value for short rows def __iter__(self): return self def next(self): row = self.reader.next() # unlike the basic reader, we prefer not to return blanks, # because we will typically wind up with a dict full of None # values while row == []: row = self.reader.next() d = dict(zip(self.fieldnames, row)) lf = len(self.fieldnames) lr = len(row) if lf < lr: d[self.restkey] = row[lf:] elif lf > lr: for key in self.fieldnames[lr:]: d[key] = self.restval return d def reset(self): self.f.seek(0) self.f.readline() From skip at pobox.com Sun Oct 17 15:51:39 2004 From: skip at pobox.com (Skip Montanaro) Date: Sun, 17 Oct 2004 08:51:39 -0500 Subject: [Csv] PEP 305 In-Reply-To: <416D3C6E.10305@ava.com.au> References: <416D3C6E.10305@ava.com.au> Message-ID: <16754.30955.440451.414391@montanaro.dyndns.org> Chris> Ths only thing is that many of the csv files I use have headers - Chris> and the Dictionary Reader doesn't cope with them well. In 2.4 this gets better. If the fieldnames arg to the constructor is None, it assumes the first line contains column headers. % head -3 garageband.csv "Bandname","Date","Venue","Address","City","State","Zip","Comments","gig_url","band_url","song_url","venue_url","spotlight" "18 Wheeler","2004-10-01","Greenfield's","3355 S Yarrow Street,","Denver","Colorado","","","http://www.garageband.com/gigs/profile.html?|pe1|X8TaC3TQ6KinZ1WxZA","http://www.garageband.com/artist/18wheeler","","http://www.jambase.com/search.asp?venueID=12774&di","F" "18 Wheeler","2004-10-02","Greenfield's","3355 S Yarrow Street","Denver","Colorado","","","http://www.garageband.com/gigs/profile.html?|pe1|X8TaC3TQ6KinZ1Wxaw","http://www.garageband.com/artist/18wheeler","","http://www.jambase.com/search.asp?venueID=12774&di","F" % python Python 2.4a3 (#47, Sep 11 2004, 13:51:15) [GCC 3.3 20030304 (Apple Computer, Inc. build 1493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import csv >>> rdr = csv.DictReader(open("garageband.csv", "rb")) >>> rdr.next() {'City': 'Denver', 'band_url': 'http://www.garageband.com/artist/18wheeler', 'Zip': '', 'song_url': '', 'Venue': "Greenfield's", 'Comments': '', 'State': 'Colorado', 'venue_url': 'http://www.jambase.com/search.asp?venueID=12774&di', 'Address': '3355 S Yarrow Street,', 'Date': '2004-10-01', 'Bandname': '18 Wheeler', 'spotlight': 'F', 'gig_url': 'http://www.garageband.com/gigs/profile.html?|pe1|X8TaC3TQ6KinZ1WxZA'} >>> rdr.next() {'City': 'Denver', 'band_url': 'http://www.garageband.com/artist/18wheeler', 'Zip': '', 'song_url': '', 'Venue': "Greenfield's", 'Comments': '', 'State': 'Colorado', 'venue_url': 'http://www.jambase.com/search.asp?venueID=12774&di', 'Address': '3355 S Yarrow Street', 'Date': '2004-10-02', 'Bandname': '18 Wheeler', 'spotlight': 'F', 'gig_url': 'http://www.garageband.com/gigs/profile.html?|pe1|X8TaC3TQ6KinZ1Wxaw'} >>> rdr.fieldnames ['Bandname', 'Date', 'Venue', 'Address', 'City', 'State', 'Zip', 'Comments', 'gig_url', 'band_url', 'song_url', 'venue_url', 'spotlight'] If you're going to be using 2.2 or 2.3 for awhile, I suggest the following idiom: % python2.3 Python 2.3.3 (#1, Apr 4 2004, 10:12:27) [GCC 3.3 20030304 (Apple Computer, Inc. build 1493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import csv >>> f = open("garageband.csv", "rb") >>> rdr = csv.reader(f) >>> rdr = csv.DictReader(f, rdr.next()) >>> rdr.next() {'City': 'Denver', 'band_url': 'http://www.garageband.com/artist/18wheeler', 'Zip': '', 'song_url': '', 'Venue': "Greenfield's", 'Comments': '', 'State': 'Colorado', 'venue_url': 'http://www.jambase.com/search.asp?venueID=12774&di', 'Address': '3355 S Yarrow Street,', 'Date': '2004-10-01', 'Bandname': '18 Wheeler', 'spotlight': 'F', 'gig_url': 'http://www.garageband.com/gigs/profile.html?|pe1|X8TaC3TQ6KinZ1WxZA'} >>> rdr.fieldnames ['Bandname', 'Date', 'Venue', 'Address', 'City', 'State', 'Zip', 'Comments', 'gig_url', 'band_url', 'song_url', 'venue_url', 'spotlight'] For writing you still have to explicitly declare the fieldnames to the constructor, and write the first row: fieldnames = [...] writer = csv.DictWriter(f, fieldnames) writer.writerow(dict(zip(fieldnames, fieldnames))) Skip