PEP 305 - CSV File API

Andrew Dalke adalke at mindspring.com
Sun Feb 2 13:51:47 EST 2003


Ian Bicking wrote:
> I agree that "write" is not the appropriate method -- I can't ever
> remember seeing a write method that didn't take a string and write it to
> a stream.

Looking through the standard library,

binhex.py: the 'write's there take a string
codecs.py: looks like the 'write' takes an "object", but which is the
    appropriate string representation for the given codec
code.py: the 'write' takes a string
ConfigParser.py: this 'write' writes the state of the ConfigParser
    to an output file.   (I would argue this should have been called
    'save' instead of 'write'.)
doctest.py: the write is used to emulate stdout, so takes a string
gzip.py: emulates a file, so takes a string
quopri.py: internal function, first term is a string
smptd.py: the 'Devnull' class emulates a file; write takes a string
socket.py: emulates file behaviour; write takes string
StringIO.py: emulates a file;  ...
telnetlib.py: write takes a string
zipfile.py: write takes a string as first param

Most, but not all, of the other packages I scanned take a string.

 > Well, there's some that may as a convenience call str() on
> the object passed, but that doesn't significantly change the feel of the
> method.  Using it to write a row definitely seems wrong.  

I found none that do that.  So it isn't frequent.

> But append makes the output seem like a sequence, when it certainly
> isn't -- it's a stream, like a file.  Again, a false cognate.

Hmm.... I'm have an "X".  You don't know what it is, but I tell you
it's some sort of container which allows forward iteration and
returns 'row' objects.  I also tell you you can add new row objects
X, but only one at a time and only after the previous one you added.
If you start appending row objects to an empty X and then read them
again from the start, you get them in the same order.

You may also be able to do other things to X, if you peer into the
internals, but I'm not going to let you.

The question for you is, what is "X"?

I could be a list, which has that behaviour.  It could be a file,
which also has that behaviour.  So could an interface to a SQL database,
and an interface to an object database like ZODB.

So I don't quite think it's a false cognate.  I know it feels
strange, so I'm willing to defer to those with more object modelling
experience than I.

Let me try saying that in another way.  When people use this CSV
API, it looks like

   input = csv.reader(file("initial.cvs"))
   output = csv.writer(file("some.cvs", "w"))
   for row in input:
     if int(row[0]) > 2:
       output.XXXX(row)

The reader generates an stream of row objects, and the output stream
takes a stream of row objects.  Now replace the input and output
containers with lists

    input = ["1 2 buckle my shoe".split(), "3 4 close the door".split()]
    output = []
    for row in input:
      if int(row[0]) > 2:
        output.append(row)

If there is indeed a valid similarity here, compared to a false
cognate, then the method "XXXX" used above should be named "append".

> I would prefer writerow(), which implies it's a stream, but does not
> imply it takes a string.


Dave Cole:
 > I like writerow() too.  I think that the reader should probably get a
 > readrow() method so you do not necessarily need to use it like an
 > iterable.

In Python 2.3, files, which are input iteratores, implement the
iterator.  The 'next()' method returns the next line.

% python2.3
Python 2.3a1 (#5, Jan  2 2003, 13:29:17)
[GCC 2.96 20000731 (Red Hat Linux 7.2 2.96-112.7.2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> infile = file("/usr/share/dict/words")
 >>> infile.next()
'Aarhus\n'
 >>> infile.next()
'Aaron\n'
 >>>

The input stream of row objects is an iterator, so should provide
its own 'next()' method.

An indeed, if you look at the sandbox implementation
 
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/sandbox/csv/csv.py?rev=1.12&content-type=text/vnd.viewcvs-markup

you'll see that the readers all implement a 'next' method.  So the
proposal for 'readrow()' is simply an alias to 'next()', except
possibly returning None on end of input rather than raising
StopIteration.

If you want 'writerow' then it begs for a 'readrow'.  But 'readrow'
is the same as 'next', so why not use 'append' instead of 'writerow'?

Anyway, I've said enough on this topic.  I'll leave the final
decision up to real object gurus.

					Andrew
					dalke at dalkescientific.com





More information about the Python-list mailing list