[Tutor] Malformed CSV
Jan Eden
lists at janeden.org
Fri Dec 2 16:26:33 CET 2005
Kent Johnson wrote on 02.12.2005:
>I'm not entirely sure how you want to interpret the data above. One
>possibility is to just change the double "" to single " before
>processing with csv. For example:
>
># data is the raw data from the whole file
>data = '''""hotel,hamburg"","1","0","0"
>""hotel,billig, in berlin tegel"","1","0","0"
>""hotel+wien"","1","0","0"
>""hotel+nurnberg"","1","0","0"
>""hotel+london"","1","0","0"
>""hotel" "budapest" "billig"","1","0","0"'''
>
>data = data.replace('""', '"')
>data = data.splitlines()
>
>import csv
>
>for line in csv.reader(data):
> print line
>
>Output is
>['hotel,hamburg', '1', '0', '0']
>['hotel,billig, in berlin tegel', '1', '0', '0']
>['hotel+wien', '1', '0', '0']
>['hotel+nurnberg', '1', '0', '0']
>['hotel+london', '1', '0', '0']
>['hotel "budapest" "billig"', '1', '0', '0']
>
>which looks pretty reasonable except for the last line, and I don't
>really know what you would consider correct there.
>
Exactly, the last line is the problem. With correct (Excel-style) quoting, it would look like this
"""hotel"" ""budapest"" ""billig""","1","0","0"
i.e. each quote within a field would be doubled, and the output would be
['"hotel" "budapest" "billig"', '1', '0', '0']
i.e. the quoting of the original search string
"hotel" "budapest" "billig"
would be preserved (and this is important). I guess I need to notify the engineer responsible for the CSV output and have the quoting corrected.
Thanks,
Jan
--
Any sufficiently advanced technology is indistinguishable from a Perl script. - Programming Perl
More information about the Tutor
mailing list