[Tutor] Deterimining the maximum length of a field in csv file
Johann Spies
johann.spies at gmail.com
Mon Feb 1 06:55:12 CET 2010
I would appreciate some help on this:
I want a script that can
1. determine the fieldnames from a csv file from the first line
2. determine the maximum length of data for each field in that file.
So far I could not figure out how to do 1 and my effort for the second
one is not working as expected. Here is my present code:
reader = csv.DictReader(open("/media/usb0/kbase/web2py/db_scopus_rou.csv"),delimiter
= ',')
csv.field_size_limit(1000000)
reader.fieldnames = ["scopus_rou.id","scopus_rou.Authors","scopus_rou.Title",
"scopus_rou.Year","scopus_rou.Source_title","scopus_rou.Volume",
"scopus_rou.Issue","scopus_rou.Art_No","scopus_rou.Page_start",
"scopus_rou.Page_end","scopus_rou.Page_count","scopus_rou.Cited_by",
"scopus_rou.Link","scopus_rou.Affiliations",
"scopus_rou.Authors_with_affiliations","scopus_rou.Abstract",
"scopus_rou.Author_Keywords","scopus_rou.Index_Keywords",
"scopus_rou.Molecular_Sequence_Numbers","scopus_rou.Chemicals_CAS",
"scopus_rou.Tradenames","scopus_rou.Manufacturers",
"scopus_rou.Funding_Details","scopus_rou.Refs",
"scopus_rou.Correspondence_Address","scopus_rou.Editors",
"scopus_rou.Sponsors","scopus_rou.Publisher",
"scopus_rou.Conference_name","scopus_rou.Conference_date",
"scopus_rou.Conference_location","scopus_rou.Conference_code",
"scopus_rou.ISSN","scopus_rou.ISBN","scopus_rou.CODEN",
"scopus_rou.DOI","scopus_rou.Pubmed_ID","scopus_rou.Language",
"scopus_rou.Abbreviated_Source_Title","scopus_rou.Document_Type",
"scopus_rou.Source"]
maksimum = { "scopus_rou.id":0,"scopus_rou.Authors":0,"scopus_rou.Title":0,
"scopus_rou.Year":0,"scopus_rou.Source_title":0,"scopus_rou.Volume":0,
"scopus_rou.Issue":0,"scopus_rou.Art_No":0,"scopus_rou.Page_start":0,
"scopus_rou.Page_end":0,"scopus_rou.Page_count":0,"scopus_rou.Cited_by":0,
"scopus_rou.Link":0,"scopus_rou.Affiliations":0,
"scopus_rou.Authors_with_affiliations":0,"scopus_rou.Abstract":0,
"scopus_rou.Author_Keywords":0,"scopus_rou.Index_Keywords":0,
"scopus_rou.Molecular_Sequence_Numbers":0,"scopus_rou.Chemicals_CAS":0,
"scopus_rou.Tradenames":0,"scopus_rou.Manufacturers":0,
"scopus_rou.Funding_Details":0,"scopus_rou.Refs":0,
"scopus_rou.Correspondence_Address":0,"scopus_rou.Editors":0,
"scopus_rou.Sponsors":0,"scopus_rou.Publisher":0,
"scopus_rou.Conference_name":0,"scopus_rou.Conference_date":0,
"scopus_rou.Conference_location":0,"scopus_rou.Conference_code":0,
"scopus_rou.ISSN":0,"scopus_rou.ISBN":0,"scopus_rou.CODEN":0,
"scopus_rou.DOI":0,"scopus_rou.Pubmed_ID":0,"scopus_rou.Language":0,
"scopus_rou.Abbreviated_Source_Title":0,"scopus_rou.Document_Type":0,
"scopus_rou.Source":0}
ry = 0
try:
for row in reader:
ry = ry + 1
for k in reader.fieldnames:
try:
lengte = len(row[k].strip())
except:
lengte = 0
if k in maksimum:
if lengte > maksimum[k]:
maksimum[k]= lengte
else:
maksimum[k] = lengte
print maksimum
except:
pass
for l in maksimum.keys:
print ("%s: %d\n" % (l, maksimum(l)))
Regards
Johann
More information about the Tutor
mailing list