First Cut at CSV PEP

Dave Cole djc at object-craft.com.au
Wed Jan 29 00:43:33 CET 2003


>>>>> "Cliff" == Cliff Wells <LogiplexSoftware at earthlink.net> writes:

Cliff> On Mon, 2003-01-27 at 20:56, Dave Cole wrote:
>> I only have one issue with the PEP as it stands.  It is still
>> aiming too low.  One of the things that we support in our parser is
>> the ability to handle CSV without quote characters.
>> 
>> field1,field2,field3\, field3,field4
>> 
>> One of our customers has data like the above.  To handle this we
>> would need something like the following:
>> 
>> # Use the 'raw' dialect to get access to all tweakables.
>> writer(fileobj, dialect='raw', quotechar=None, delimiter=',',
>> escapechar='\\')

Cliff> +1 on escapechar, -1 on 'raw' dialect.

See below.

Cliff> Why would a 'raw' dialect be needed?  It isn't clear to me why
Cliff> escapechar would be mutually exclusive with any particular
Cliff> dialect.  Further, not specifying a dialect (dialect=None)
Cliff> should be the default which would seem the same as 'raw'.

>> I think that we need some way to handle a potentially different set
>> of options on each dialect.

Cliff> I'm not understanding how this is different from Skip's
Cliff> suggestion to use

Cliff> reader(fileobj, dialect="excel2000", delimiter='\t')

Cliff> Or are you suggesting that not all options would be available
Cliff> on all dialects?  Can you suggest an example?

I think it is important to keep in mind the users of the module who
are not expert in the various dialects of CSV.  If presented with a
flat list of all options supported they are going to engage in a fair
amount of head scratching.

If we try to make things easier for users by mirroring the options
that their application presents then they are going to have a much
easier time working out how to use the module for their specific
problem.  By limiting the available options based upon the dialect
specified by the user we will be doing them a favour.

The point of the 'raw' dialect is to expose the full capabilities of
the raw parser.  Maybe we should use None rather than 'raw'.

>> When you CSV export from Excel, do you have the ability to use a
>> delimiter other than comma?  Do you have the ability to change the
>> quotechar?

Cliff> I think it is an option to save as a TSV file (IIRC), which is
Cliff> the same as a CSV file, but with tabs.

Hmm...  What would be the best way to handle Excel TSV.  Maybe a new
dialect 'excel-tsv'?

>> Should the wrapper protect you from yourself so that when you
>> select the Excel dialect you are limited to the options available
>> within Excel?

Cliff> No.  I think this would be unnecessarily limiting.

I am not saying that the wrapper should absolutely prevent someone
from using options not available in the application.  If you want to
break the dialect then maybe it should be a two step process.

    csvwriter = csv.writer(file("newnastiness.csv", "w"),
                           dialect='excel2000')
    csvwriter.setparams(delimiter='"')

- Dave

-- 
http://www.object-craft.com.au




More information about the Csv mailing list