[Tutor] using python to read csv clean record and write out csv

Oscar Benjamin oscar.j.benjamin at gmail.com
Tue Nov 6 01:46:17 CET 2012


On 2 November 2012 10:40, Sacha Rook <sacharook at gmail.com> wrote:
>
> I have a problem with a csv file from a supplier, so they export data to csv
> however the last column in the record is a description which is marked up
> with html.
>
> trying to automate the processing of this csv to upload elsewhere in a
> useable format. If i open the csv with csved it looks like all the records
> aren't escaped correctly as after a while i find html tags and text on the
> next line/record.
>
> If I 'openwith' excel the description stays on the correct line/record?
>
> I want to use python to read these records in and output a valid csv with
> the descriptions intact preferably without the html tags so a string of text
> formatted with newline/CR where appropriate.
>
> So far I have this but don't know where to go from here can someone help me?
>
> import csv
>
> infile = open('c:\data\input.csv', 'rb')
> outfile = open('c:\data\output.csv', 'wb')
>
> reader = csv.reader(infile)
> writer = csv.writer(outfile)
>
>
> for line in reader:
>     print line
>     writer.writerow(line)
>

You already have a program. Does it work? If not, then what's wrong
with the output?

If you get an error message can you please show the exact error message?

> I have attached the input.csv i hope this is good form here?
>
> I know I am far from complete but don't know how to proceed :-)

It's okay to send attachments when there is a need to. It would be
good though to cut the csv file down to a smaller size before posting
it here. That's 4 MB wasting space in a lot of inboxes. Better yet,
you could copy the first three lines directly into the email so that
people can see it without needing to download the attachment.


Oscar


More information about the Tutor mailing list