better csv modules and where have object-craft gone?

Tue May 18 11:51:38 EDT 2004

    Tim> I have been using object crafts csv module for quite a few
    Tim> projects, mainly because I found the csv in python in it's current
    Tim> incarnation is funtionally inferior to object crafts. The object
    Tim> craft module for instance allowed you build up csv gradually (ie
    Tim> field at a time rather the python csv module where the writer does
    Tim> the work a record at a time) which isn't always the way I would
    Tim> like to work, also I have always had encoding problems (specifcally
    Tim> it doesn't support unicode as per the docs) everytime I used the
    Tim> python module where as the object craft one always worked out of
    Tim> the box.

I guess beauty is in the eye of the beholder.  The Object Craft folks were
key authors of what's in the Python distribution.  If you want to write a
field at a time, you should be able to subclass the csv.writer class and add
writefield() and commit() methods.  The first appends to an internal list.
The second calls writerow() and clears the list.  Something like this
(untested) code might work:

    class FieldWriter(csv.writer):
        def __init__(self, *args, **kwds):
            csv.writer.__init__(self, *args, **kwds)
            self.temp = []

        def writefield(self, val):
            self.temp.append(val)

        def commit(self):
            self.writerow(self.temp)
            self.temp = []

(Be careful.  You'll lose partial results if you don't clean up in a __del__
method.)

As for lack of Unicode support that's a known issue.  I suppose it hasn't
been high enough on anyone's list of itches to have attracted any scratching
yet.  Still, you might be able to get most of the way there with a subclass:

    class UnicodeWriter(csv.writer):
        def __init__(self, *args, **kwds):
            self.encoding = kwds.get('encoding', 'utf-8')
            if 'encoding' in kwds: del kwds['encoding']
            csv.writer.__init__(self, *args, **kwds)

        def writerow(self, row):
            for (i,f) in enumerate(row):
                if isinstance(f, unicode):
                    row[i] = f.encode(self.encoding)

I'm almost certain that reading data in multibyte encodings won't work
though, as the low-level reader is byte-oriented instead of
character-oriented.  Patches are welcome to resolve that deficiency.

Skip