csv and mixed lists of unicode and numbers
Peter Otten
__peter__ at web.de
Tue Nov 24 14:04:42 EST 2009
Sibylle Koczian wrote:
> I want to put data from a database into a tab separated text file. This
> looks like a typical application for the csv module, but there is a
> snag: the rows I get from the database module (kinterbasdb in this case)
> contain unicode objects and numbers. And of course the unicode objects
> contain lots of non-ascii characters.
>
> If I try to use csv.writer as is, I get UnicodeEncodeErrors. If I use
> the UnicodeWriter from the module documentation, I get TypeErrors with
> the numbers. (I'm using Python 2.6 - upgrading to 3.1 on this machine
> would cause other complications.)
>
> So do I have to process the rows myself and treat numbers and text
> fields differently? Or what's the best way?
I'd preprocess the rows as I tend to prefer the simplest approach I can come
up with. Example:
def recode_rows(rows, source_encoding, target_encoding):
def recode(field):
if isinstance(field, unicode):
return field.encode(target_encoding)
elif isinstance(field, str):
return unicode(field, source_encoding).encode(target_encoding)
return unicode(field).encode(target_encoding)
return (map(recode, row) for row in rows)
rows = [[1.23], [u"äöü"], [u"ÄÖÜ".encode("latin1")], [1, 2, 3]]
writer = csv.writer(sys.stdout)
writer.writerows(recode_rows(rows, "latin1", "utf-8"))
The only limitation I can see: target_encoding probably has to be a superset
of ASCII.
Peter
More information about the Python-list
mailing list