[Tutor] Malformed CSV
Jan Eden
lists at janeden.org
Fri Dec 2 14:50:17 CET 2005
Hi,
I need to parse a CSV file using the csv module:
"hotel","9,463","95","1.00"
"hotels","7,033","73","1.04"
"hotels hamburg","2,312","73","3.16"
"hotel hamburg","2,708","42","1.55"
"Hotels","2,854","41","1.44"
"hotel berlin","2,614","31","1.19"
The idea is to use each single keyword (field 1) as a dictionary key and sum up the clicks (field 2) and transactions (field 3):
try:
keywords[keyword]['clicks'] += clicks
keywords[keyword]['transactions'] += transactions
# if the keyword has not been found yet...
except KeyError:
keywords[keyword] = { 'clicks' : clicks, 'transactions' :
transactions }
Unfortunately, the quote characters are not properly escaped within fields:
""hotel,hamburg"","1","0","0"
""hotel,billig, in berlin tegel"","1","0","0"
""hotel+wien"","1","0","0"
""hotel+nürnberg"","1","0","0"
""hotel+london"","1","0","0"
""hotel" "budapest" "billig"","1","0","0"
which leads to the following output (example):
hotel 9,463hamburg""billig 951 in berlin tegel""
As you can see, Python added 'hamburg""' and 'billig' to the first 'hotel' row's click value (9,463), and '1' as well as ' in berlin tegel' to the transactions (95). I am aware that I need to convert real clicks/transactions to integers before adding them, but I first wanted to sort out the parsing problem.
Is there a way to deal with the incorrect quoting automatically?
Thanks,
Jan
--
I was gratified to be able to answer promptly, and I did. I said I didn't know. - Mark Twain
More information about the Tutor
mailing list