csv and mixed lists of unicode and numbers
Terry Reedy
tjreedy at udel.edu
Tue Nov 24 17:55:11 EST 2009
Sibylle Koczian wrote:
> Peter Otten schrieb:
>> I'd preprocess the rows as I tend to prefer the simplest approach I can come
>> up with. Example:
>>
>> def recode_rows(rows, source_encoding, target_encoding):
>> def recode(field):
>> if isinstance(field, unicode):
>> return field.encode(target_encoding)
>> elif isinstance(field, str):
>> return unicode(field, source_encoding).encode(target_encoding)
>> return unicode(field).encode(target_encoding)
>>
>> return (map(recode, row) for row in rows)
>>
>
> For this case isinstance really seems to be quite reasonable. And it was
> silly of me not to think of sys.stdout as file object for the example!
>
>> rows = [[1.23], [u"äöü"], [u"ÄÖÜ".encode("latin1")], [1, 2, 3]]
>> writer = csv.writer(sys.stdout)
>> writer.writerows(recode_rows(rows, "latin1", "utf-8"))
>>
>> The only limitation I can see: target_encoding probably has to be a superset
>> of ASCII.
>>
>
> Coping with umlauts and accents is quite enough for me.
>
> This problem really goes away with Python 3 (tried it on another
> machine), but something else changes too: in Python 2.6 the
> documentation for the csv module explicitly says "If csvfile is a file
> object, it must be opened with the ‘b’ flag on platforms where that
> makes a difference." The documentation for Python 3.1 doesn't have this
> sentence, and if I do that in Python 3.1 I get for all sorts of data,
> even for a list with only one integer literal:
>
> TypeError: must be bytes or buffer, not str
>
> I don't really understand that.
In Python 3, a file opened in 'b' mode is for reading and writing bytes
with no encoding/decoding. I believe cvs works with files in text mode
as it returns and expects strings/text for reading and writing. Perhaps
the cvs doc should say must not be opened in 'b' mode. Not sure.
tjr
More information about the Python-list
mailing list