[Patches] [ python-Patches-1225769 ] Proposal to implement comment rows in csv module

SourceForge.net noreply at sourceforge.net
Sun Feb 11 19:59:09 CET 2007


Patches item #1225769, was opened at 2005-06-22 14:48
Message generated for change (Comment added) made by montanaro
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1225769&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Iain Haslam (iain_haslam)
Assigned to: Skip Montanaro (montanaro)
Summary: Proposal to implement comment rows in csv module

Initial Comment:
Sometimes csv files contain comment rows, for
temporarily commenting out data or occasionally for
documentation. The current csv module has no built-in
ability to skip rows; in order to skip all lines
beginning with '#', the programmer writes something like:

csv_reader = csv.reader(fp)
for row in csv_reader:
    if row[0][0] != '#':    #assuming no blank lines
        print row

I propose adding a "commentchar" parameter to the csv
parser, so that the above code could be written (more
elegantly, in my opinion):

csv_reader = csv.reader(fp, commentchar='#')
for row in csv_reader:
    print row

This requires only relatively minor changes to the
module, and by defaulting to using no comment
character, existing code will behave as before. If you
are interested, the patch (diffs against current cvs)
required for the second example to run are attached.

Note that that implementation adds SKIPPED_RECORD as a
pseudonym for START_RECORD, because setting status to
START_RECORD after skipping a record would cause a
blank record to be returned.  Altering that behaviour
would cause more changes and the patch would be harder
to review. I've also held back on updating tests and
documentation to reflect this change, pending any
support for it.

It shoud be irrelevant, but this has been developed on
Debian testing against the cvs head of Python.

----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2007-02-11 12:59

Message:
Logged In: YES 
user_id=44345
Originator: NO

Sorry, I'm coming back to this after a long hiatus...  I'm still not
inclined to make this change to the C source.  I think a) comments in CSV
files are pretty rare and that b) implementing this using a file object
wrapper would be more flexible.

#!/usr/bin/env python

import csv
import StringIO

class CommentedFile:
    def __init__(self, f, commentstring="#"):
        self.f = f
        self.commentstring = commentstring

    def next(self):
        line = self.f.next()
        while line.startswith(self.commentstring):
            line = self.f.next()
        return line

    def __iter__(self):
        return self

f = StringIO.StringIO('''\
"event","performers","start","end","time"
# Rachel Sage
"","Rachael Sage","2008-01-03","2008-01-03","8:00pm"
# Others
"","Tung-N-GRoeVE","2008-01-16","2008-01-16","9:30pm-2:00am"
"","Bossa Nova Beatniks","2007-11-11","2007-11-11","11:11pm"
"","Special Consensus","2006-10-06","2006-10-06",""
''')

for row in csv.reader(CommentedFile(f)):
    print row

The output of the above is as expected:

['event', 'performers', 'start', 'end', 'time']
['', 'Rachael Sage', '2008-01-03', '2008-01-03', '8:00pm']
['', 'Tung-N-GRoeVE', '2008-01-16', '2008-01-16', '9:30pm-2:00am']
['', 'Bossa Nova Beatniks', '2007-11-11', '2007-11-11', '11:11pm']
['', 'Special Consensus', '2006-10-06', '2006-10-06', '']

This has the added benefit that comment lines aren't restricted to single
character comment prefixes.  On the downside, comment characters
appearing
at the beginning of a continuation line would trip this up.  In practice,
I
don't think it would be a significant limitation.  In almost all cases I
suspect CSV files with embedded comments would be manually created and
maintained and aren't likely to contain fields with embedded comments.

Skip


----------------------------------------------------------------------

Comment By: Iain Haslam (iain_haslam)
Date: 2005-06-30 18:40

Message:
Logged In: YES 
user_id=1301296

Here are the documentation and test diffs.

I'm glad to hear of the positive feedback. I couldn't find
the csv mailing list (I assume it's not public), so didn't
see the discussion of round-tripping, but I agree with the
implied conclusion that flagging rows as comments would
complicate the interface too much.

On a related point, I noticed that the csv documentation is
filed under "Internet data handling". This seems a little
odd - I'd suggest moving it to "Miscellaneous Services"
alongside fileinput and ConfigParser.

Thanks,
Iain.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2005-06-27 12:25

Message:
Logged In: YES 
user_id=44345

Iain - There was some positive response to your patch from
the csv mailing list (most notably from one of the authors of
the C extension module).  Can you provide diffs for the
module documentation and necessary test cases to go along
with your patch?  Also, addressing the issue that CSV files
with comments (probably?) won't round-trip would be a good
thing to note in the docs.

Skip


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2005-06-26 13:02

Message:
Logged In: YES 
user_id=44345

Something else just occurred to me.  What about writing csv files with
comments?  Also, a tweak to the docs would be in order if this is
accepted.


----------------------------------------------------------------------

Comment By: Iain Haslam (iain_haslam)
Date: 2005-06-26 12:26

Message:
Logged In: YES 
user_id=1301296

> I'm not inclined to clutter the C code with further 
> complications.

Sorry - I haven't been keeping up with the existing
complications! Don't forget that one man's clutter is
another man's functionality. It doesn't actually require
much of a change to the code, although I was slightly
suprised to discover that this module was in C in the first
place...

Basically, I noticed that the csv module has a bias towards
Excel-generated csv files, but most of the time I've come
across csv files, they were hand-edited, and I've often seen
comment fields as described in the submission.

My submission was intended in the "batteries included" 
spirit (I do understand that you stop short of the kitchen
sink), and also seemed in-keeping with the
'skipinitialspace' option within the existing csv module.

> Why can't you implement this on an as-needed basis  
> with a file object wrapper [other options]

True, I could do any of those things, but it would be
simpler / clearer not to have to. Of course, if you took
your argument further, you could cut chunks out of several
modules; the argument comes down to whether the benefits
outweigh the additional complexity. I was suprised to
discover the option wasn't already there, but maybe that's
just me.

In any case, if your vote goes from your apparent -0 to -1,
that's your choice, and you're better placed to make it than
I am.

Cheers,
Iain.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2005-06-25 14:39

Message:
Logged In: YES 
user_id=44345

I'm not inclined to clutter the C code with further complications.  Why
can't you implement this on an as-needed basis with a file object
wrapper,
a subclass of the csv.reader class, or just continue to use the example
in
your submission?



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1225769&group_id=5470


More information about the Patches mailing list