csv and mixed lists of unicode and numbers

Tue Nov 24 17:55:11 EST 2009

Sibylle Koczian wrote:
> Peter Otten schrieb:
>> I'd preprocess the rows as I tend to prefer the simplest approach I can come 
>> up with. Example:
>>
>> def recode_rows(rows, source_encoding, target_encoding):
>>     def recode(field):
>>         if isinstance(field, unicode):
>>             return field.encode(target_encoding)
>>         elif isinstance(field, str):
>>             return unicode(field, source_encoding).encode(target_encoding)
>>         return unicode(field).encode(target_encoding)
>>
>>     return (map(recode, row) for row in rows)
>>
> 
> For this case isinstance really seems to be quite reasonable. And it was
> silly of me not to think of sys.stdout as file object for the example!
> 
>> rows = [[1.23], [u"äöü"], [u"ÄÖÜ".encode("latin1")], [1, 2, 3]]
>> writer = csv.writer(sys.stdout)
>> writer.writerows(recode_rows(rows, "latin1", "utf-8"))
>>
>> The only limitation I can see: target_encoding probably has to be a superset 
>> of ASCII.
>>
> 
> Coping with umlauts and accents is quite enough for me.
> 
> This problem really goes away with Python 3 (tried it on another
> machine), but something else changes too: in Python 2.6 the
> documentation for the csv module explicitly says "If csvfile is a file
> object, it must be opened with the ‘b’ flag on platforms where that
> makes a difference." The documentation for Python 3.1 doesn't have this
> sentence, and if I do that in Python 3.1 I get for all sorts of data,
> even for a list with only one integer literal:
> 
> TypeError: must be bytes or buffer, not str
> 
> I don't really understand that.

In Python 3, a file opened in 'b' mode is for reading and writing bytes 
with no encoding/decoding. I believe cvs works with files in text mode 
as it returns and expects strings/text for reading and writing. Perhaps 
the cvs doc should say must not be opened in 'b' mode. Not sure.

tjr