[Tutor] Deterimining the maximum length of a field in csv file

Mon Feb 1 06:55:12 CET 2010

I would appreciate some help on this:

I want a script that can

1. determine the fieldnames from a csv file from the first line
2. determine the maximum length of data for each field in that file.

So far I could not figure out how to do 1 and my effort for the second
one is not working as expected.  Here is my present code:

reader = csv.DictReader(open("/media/usb0/kbase/web2py/db_scopus_rou.csv"),delimiter
= ',')
csv.field_size_limit(1000000)
reader.fieldnames  = ["scopus_rou.id","scopus_rou.Authors","scopus_rou.Title",
         "scopus_rou.Year","scopus_rou.Source_title","scopus_rou.Volume",
         "scopus_rou.Issue","scopus_rou.Art_No","scopus_rou.Page_start",
         "scopus_rou.Page_end","scopus_rou.Page_count","scopus_rou.Cited_by",
         "scopus_rou.Link","scopus_rou.Affiliations",
         "scopus_rou.Authors_with_affiliations","scopus_rou.Abstract",
         "scopus_rou.Author_Keywords","scopus_rou.Index_Keywords",
         "scopus_rou.Molecular_Sequence_Numbers","scopus_rou.Chemicals_CAS",
         "scopus_rou.Tradenames","scopus_rou.Manufacturers",
         "scopus_rou.Funding_Details","scopus_rou.Refs",
         "scopus_rou.Correspondence_Address","scopus_rou.Editors",
         "scopus_rou.Sponsors","scopus_rou.Publisher",
         "scopus_rou.Conference_name","scopus_rou.Conference_date",
         "scopus_rou.Conference_location","scopus_rou.Conference_code",
         "scopus_rou.ISSN","scopus_rou.ISBN","scopus_rou.CODEN",
         "scopus_rou.DOI","scopus_rou.Pubmed_ID","scopus_rou.Language",
         "scopus_rou.Abbreviated_Source_Title","scopus_rou.Document_Type",
         "scopus_rou.Source"]

maksimum = { "scopus_rou.id":0,"scopus_rou.Authors":0,"scopus_rou.Title":0,
         "scopus_rou.Year":0,"scopus_rou.Source_title":0,"scopus_rou.Volume":0,
         "scopus_rou.Issue":0,"scopus_rou.Art_No":0,"scopus_rou.Page_start":0,
         "scopus_rou.Page_end":0,"scopus_rou.Page_count":0,"scopus_rou.Cited_by":0,
         "scopus_rou.Link":0,"scopus_rou.Affiliations":0,
         "scopus_rou.Authors_with_affiliations":0,"scopus_rou.Abstract":0,
         "scopus_rou.Author_Keywords":0,"scopus_rou.Index_Keywords":0,
         "scopus_rou.Molecular_Sequence_Numbers":0,"scopus_rou.Chemicals_CAS":0,
         "scopus_rou.Tradenames":0,"scopus_rou.Manufacturers":0,
         "scopus_rou.Funding_Details":0,"scopus_rou.Refs":0,
         "scopus_rou.Correspondence_Address":0,"scopus_rou.Editors":0,
         "scopus_rou.Sponsors":0,"scopus_rou.Publisher":0,
         "scopus_rou.Conference_name":0,"scopus_rou.Conference_date":0,
         "scopus_rou.Conference_location":0,"scopus_rou.Conference_code":0,
         "scopus_rou.ISSN":0,"scopus_rou.ISBN":0,"scopus_rou.CODEN":0,
         "scopus_rou.DOI":0,"scopus_rou.Pubmed_ID":0,"scopus_rou.Language":0,
         "scopus_rou.Abbreviated_Source_Title":0,"scopus_rou.Document_Type":0,
         "scopus_rou.Source":0}
ry = 0
try:
    for row in reader:
        ry = ry + 1
        for k in  reader.fieldnames:
            try:
                lengte = len(row[k].strip())
            except:
                lengte = 0
            if k in maksimum:
                if lengte > maksimum[k]:
                    maksimum[k]= lengte
                else:
                    maksimum[k] = lengte
        print maksimum
except:
    pass

for l in maksimum.keys:
    print ("%s: %d\n" % (l, maksimum(l)))

Regards
Johann