[Tutor] Malformed CSV
Kent Johnson
kent37 at tds.net
Fri Dec 2 15:29:42 CET 2005
Jan Eden wrote:
> Hi,
>
> I need to parse a CSV file using the csv module:
>
> "hotel","9,463","95","1.00"
> "hotels","7,033","73","1.04"
> "hotels hamburg","2,312","73","3.16"
> "hotel hamburg","2,708","42","1.55"
> "Hotels","2,854","41","1.44"
> "hotel berlin","2,614","31","1.19"
>
> Unfortunately, the quote characters are not properly escaped within fields:
>
> ""hotel,hamburg"","1","0","0"
> ""hotel,billig, in berlin tegel"","1","0","0"
> ""hotel+wien"","1","0","0"
> ""hotel+nürnberg"","1","0","0"
> ""hotel+london"","1","0","0"
> ""hotel" "budapest" "billig"","1","0","0"
>
> Is there a way to deal with the incorrect quoting automatically?
I'm not entirely sure how you want to interpret the data above. One possibility is to just change the double "" to single " before processing with csv. For example:
# data is the raw data from the whole file
data = '''""hotel,hamburg"","1","0","0"
""hotel,billig, in berlin tegel"","1","0","0"
""hotel+wien"","1","0","0"
""hotel+nurnberg"","1","0","0"
""hotel+london"","1","0","0"
""hotel" "budapest" "billig"","1","0","0"'''
data = data.replace('""', '"')
data = data.splitlines()
import csv
for line in csv.reader(data):
print line
Output is
['hotel,hamburg', '1', '0', '0']
['hotel,billig, in berlin tegel', '1', '0', '0']
['hotel+wien', '1', '0', '0']
['hotel+nurnberg', '1', '0', '0']
['hotel+london', '1', '0', '0']
['hotel "budapest" "billig"', '1', '0', '0']
which looks pretty reasonable except for the last line, and I don't really know what you would consider correct there.
Kent
--
http://www.kentsjohnson.com
More information about the Tutor
mailing list