csv and mixed lists of unicode and numbers
Benjamin Kaplan
benjamin.kaplan at case.edu
Tue Nov 24 12:50:29 EST 2009
On Tue, Nov 24, 2009 at 11:42 AM, Sibylle Koczian <nulla.epistola at web.de> wrote:
> Hello,
> I want to put data from a database into a tab separated text file. This
> looks like a typical application for the csv module, but there is a
> snag: the rows I get from the database module (kinterbasdb in this case)
> contain unicode objects and numbers. And of course the unicode objects
> contain lots of non-ascii characters.
> If I try to use csv.writer as is, I get UnicodeEncodeErrors. If I use
> the UnicodeWriter from the module documentation, I get TypeErrors with
> the numbers. (I'm using Python 2.6 - upgrading to 3.1 on this machine
> would cause other complications.)
> So do I have to process the rows myself and treat numbers and text
> fields differently? Or what's the best way?
> Here is a small example:
> ########################################################################
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
> import csv, codecs, cStringIO
> import tempfile
> cData = [u'Ärger', u'Ödland', 5, u'Süßigkeit', u'élève', 6.9, u'forêt']
> class UnicodeWriter:
> """
> A CSV writer which will write rows to CSV file "f",
> which is encoded in the given encoding.
> """
> def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
> # Redirect output to a queue
> self.queue = cStringIO.StringIO()
> self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
> self.stream = f
> self.encoder = codecs.getincrementalencoder(encoding)()
> def writerow(self, row):
> self.writer.writerow([s.encode("utf-8") for s in row])
try doing [s.encode("utf-8") if isinstance(s,unicode) else s for s in row]
That way, you'll only encode the unicode strings
> # Fetch UTF-8 output from the queue ...
> data = self.queue.getvalue()
> data = data.decode("utf-8")
> # ... and reencode it into the target encoding
> data = self.encoder.encode(data)
> # write to the target stream
> self.stream.write(data)
> # empty queue
> self.queue.truncate(0)
> def writerows(self, rows):
> for row in rows:
> self.writerow(row)
> def writewithcsv(outfile, datalist):
> wrt = csv.writer(outfile, dialect=csv.excel)
> wrt.writerow(datalist)
> def writeunicode(outfile, datalist):
> wrt = UnicodeWriter(outfile)
> wrt.writerow(datalist)
> def main():
> with tempfile.NamedTemporaryFile() as csvfile:
> print "CSV file:", csvfile.name
> print "Try with csv.writer"
> try:
> writewithcsv(csvfile, cData)
> except UnicodeEncodeError as e:
> print e
> print "Try with UnicodeWriter"
> writeunicode(csvfile, cData)
> print "Ready."
> if __name__ == "__main__":
> main()
> ##############################################################################
> Hoping for advice,
> Sibylle
> --
> http://mail.python.org/mailman/listinfo/python-list
More information about the Python-list
mailing list