[Tutor] Urgent: unicode problems writing CSV file

Wed Jun 8 13:19:33 EDT 2016

On 08/06/2016 14:54, Alex Hall wrote:
> All,
> I'm working on a project that writes CSV files, and I have to get it done
> very soon. I've done this before, but I'm suddenly hitting a problem with
> unicode conversions. I'm trying to write data, but getting the standard
> cannot encode character: ordinal not in range(128)
> 
> I've tried
> str(info).encode("utf8")
> str(info).decode(utf8")
> unicode(info, "utf8")
> csvFile = open("myFile.csv", "wb", encoding="utf-8") #invalid keyword
> argument
> 
> What else can I do? As I said, I really have to get this working soon, but
> I'm stuck on this stupid unicode thing. Any ideas will be great. Thanks.
> 

This is a little tricky. I assume that you're on Python 2.x (since
open() isn't taking an encoding). Deep in the bowels of the CSV module's
C implmentation is code which converts every item in the row it's
receiving to a string. (Essentially does: [str(x) for x in row]). Which
will assume ascii: there's no opportunity to specify an encoding.

For things whose __str__ returns something ascii-ish, that's fine. But
if your data does or is likely to contain non-ascii data, you'll need to
preprocess it. How you do it, and how general-purpose that approach is
will depend on your data. For the purposes of discussion, let's assume
your data looks like this:

unicode, int, int

Then your encoder could do this:

def encoder_of_rows(row):
  return [row[0].encode("utf-8"), str(row[1]), str(row[2])]

and your csv processor could do this:

rows = [...]
with open("filename.csv", "wb") as f:
  writer = csv.writer(f)
  writer.writerows([encoder_of_rows(row) for row in rows])

but if could be more (or less) complex than that depending on your data
and how much you know about it.

TJG