[CSV] Re: First Cut at CSV PEP

Dave Cole djc at object-craft.com.au
Wed Jan 29 02:21:42 CET 2003


>>>>> "Cliff" == Cliff Wells <LogiplexSoftware at earthlink.net> writes:

Cliff> On Tue, 2003-01-28 at 16:47, Dave Cole wrote:
>> >>>>> "Cliff" == Cliff Wells <LogiplexSoftware at earthlink.net>
>> writes:
>> 
>> >> Instead of limiting the tweakable options by raising an
>> exception >> we could have an interface which allowed the user to
>> query the >> options normally associated with a dialect.
>> >> 
>> >> >> Hmm...  What would be the best way to handle Excel TSV.
>> Maybe a >> >> new dialect 'excel-tsv'?
>> 
Cliff> So are we leaning towards dialects being done as simple
Cliff> classes?  Will 'excel-tsv' simply be defined as
>>
Cliff> class excel_tsv(excel_2000): delimiter = '\t'
>>
Cliff> with a dictionary for lookup:
>>
Cliff> settings = { 'excel-tsv': excel_tsv, 'excel-2000': excel_2000,
Cliff> }
>>  Dunno yet.
>> 
>> Here we go again with a potentially bad idea...
>> 
>> I think that there are two things we need to have for each dialect;
>> a set of low level parser configuration, and a set of user
>> tweakables (which correspond to options presented by the
>> application).  The set of user tweakables may not necessarily map
>> one-to-one with low level parser configuration items.

Cliff> Can you give examples?  I suppose you are referring to things
Cliff> like CR/LF translation and spaces around quotes as being
Cliff> low-level parser configurations and things like delimiters
Cliff> being user-tweakable?

I do not have access to the software at the moment, but not long ago I
used a program called TOAD which was a GUI for fiddling around with
Oracle as a client.  One of the things you could after executing a
query was export the results to a file.  I seem to recall that the
export dialog has a number of options which do not cleanly map onto
just one of the settings we would place in our writer/reader.

I will see if I can get a screen shot of the dialog...

Cliff> Maybe. Currently the sniffing code in DSV just makes a best
Cliff> guess regarding delimiters, text qualifiers and headers.
Cliff> Certainly the dialects could be used to improve its guess (most
Cliff> likely when the sniffed results are ambiguous or fail).

Cliff> Using dialects on import is of less importance if sniffing code
Cliff> is used.  They are two different approaches to the same
Cliff> problem.  If the user specifies the file as Excel compatible,
Cliff> then sniffing seems rather redundant, further, if the file is
Cliff> sniffed and the format discovered, it doesn't seem important
Cliff> which dialect it matches, as long as we are able to use the
Cliff> sniffed parameters to parse it.

The sniffer is definitely your area of expertise.  I am just making
stuff up as I go :-)

- Dave

-- 
http://www.object-craft.com.au




More information about the Csv mailing list