DSVWizard.py

Dave Cole djc at object-craft.com.au
Mon Jan 27 06:08:21 CET 2003


> On Sun, 2003-01-26 at 16:33, Skip Montanaro wrote:
> > I'm adding Dave Cole to the distribution list on this note.  Dave,
> > Kevin Altis, Cliff Wells (author of DSV) and I have exchanged a
> > few messages about trying to develop a CSV API for Python.
> > 
> >     >> I suspect most of the differences I see between the DSV and csv
> >     >> modules are due to interpretation differences between Cliff and Dave.
> > 
> >     Cliff> Or a bug in an older version of DSV.  If you have anything that
> >     Cliff> differs using 1.4, please pass it on so I can take a look at it.
> > 
> > I downloaded 1.4 just now.  The sfsample.csv file is now processed
> > identically by the two modules.  The nastiness.csv file generates
> > three differences though:
> > 
> >     % python shootout.py nastiness.csv 
> >     DSV: 0.01 seconds, 13 rows
> >     csv: 0.00 seconds, 13 rows
> >     2
> >     DSV: ['Test 1', 'Fred said "hey!", and left the room', '']
> >     csv: ['Test 1', ' "Fred said ""hey!""', ' and left the room"', ' ""']
> 
> IMO, Dave's is incorrect in this one (unless he has specific reasons
> otherwise).

Andrew (who has been included on th Cc) has tested the behaviour of
Excel (such as it is) and we do the same thing as Excel.  As to
whether Excel is doing the right thing, that is a different question
entirely.

One of the people we have done work for has some very nasty "CSV" data
to parse.  We have been trying to work out what to do to the CSV
module to handle some of the silly things he sees without breaking the
Excel compatibility.

> The original line (from the csv file) is:
> 
> Test 1, "Fred said ""hey!"", and left the room", ""
> 
> The "" at the end is an empty, quoted field.  Maybe someone should
> run this through Excel to see what it claims (I'd be willing to
> accept Dave's interpretation if Excel does it this way, although I'd
> still feel it was incorrect).  I handled this case specifically at a
> user's request.

Andrew, can you run that exact line through Excel?

> >     10
> >     DSV: ['Test 9', 'no spaces around this', ' but single spaces around this ']
> >     csv: ['Test 9', ' "no spaces around this" ', ' but single spaces around this ']
> >     12
> >     DSV: ['Test 11', 'has no spaces around anything', 'because the data is quoted']
> >     csv: ['   "Test 11"  ', '   "has no spaces around anything"   ', '   "because the data is quoted"    ']
> > 
> > All the three lines have white space immediately following
> > separating commas.  DSV appears to skip over this white space,
> > while csv treats it as part of the field contents.

I am fairly sure that is what Excel does.

> Again, this was at a user's request, and is special-case code in DSV
> that can easily be removed.  The user noted, and I concurred, that
> given a quoted field with whitespace around it, the whitespace
> should be ignored.  However, once again I'd be willing to follow
> Excel's lead in this because I'd also consider this to be malformed
> or at least ambiguous data.

Pity there is no real specification for CSV.

> > PS, Just so Dave has the same "test harness", I've attached
> > shootout.py and nastiness.csv.  The shootout.py script now assumes
> > DSV is installed with the package structure of DSV 1.4.0.



-- 
http://www.object-craft.com.au




More information about the Csv mailing list