[CSV] Re: First Cut at CSV PEP
Cliff Wells
LogiplexSoftware at earthlink.net
Wed Jan 29 02:11:33 CET 2003
On Tue, 2003-01-28 at 16:47, Dave Cole wrote:
> >>>>> "Cliff" == Cliff Wells <LogiplexSoftware at earthlink.net> writes:
>
> >> Instead of limiting the tweakable options by raising an exception
> >> we could have an interface which allowed the user to query the
> >> options normally associated with a dialect.
> >>
> >> >> Hmm... What would be the best way to handle Excel TSV. Maybe a
> >> >> new dialect 'excel-tsv'?
>
> Cliff> So are we leaning towards dialects being done as simple
> Cliff> classes? Will 'excel-tsv' simply be defined as
>
> Cliff> class excel_tsv(excel_2000):
> Cliff> delimiter = '\t'
>
> Cliff> with a dictionary for lookup:
>
> Cliff> settings = { 'excel-tsv': excel_tsv,
> Cliff> 'excel-2000': excel_2000,
> Cliff> }
>
> Dunno yet.
>
> Here we go again with a potentially bad idea...
>
> I think that there are two things we need to have for each dialect; a
> set of low level parser configuration, and a set of user tweakables
> (which correspond to options presented by the application). The set
> of user tweakables may not necessarily map one-to-one with low level
> parser configuration items.
Can you give examples? I suppose you are referring to things like CR/LF
translation and spaces around quotes as being low-level parser
configurations and things like delimiters being user-tweakable?
>
> How would we do this in Python?
>
> >> Should we have a sniffer in the module?
>
> Cliff> This hasn't been brought up, but of course one of the major
> Cliff> selling points of DSV is the "sniffing" code. However, I think
> Cliff> I'm with Dave on having another layer (CSVutils) that would
> Cliff> contain this sort of thing.
>
> Any sniffer would have to be able to traverse the set of dialects
> implemented in the CSV module and look inside them to understand
> which options are available to a dialect.
Maybe. Currently the sniffing code in DSV just makes a best guess
regarding delimiters, text qualifiers and headers. Certainly the
dialects could be used to improve its guess (most likely when the
sniffed results are ambiguous or fail).
Using dialects on import is of less importance if sniffing code is
used. They are two different approaches to the same problem. If the
user specifies the file as Excel compatible, then sniffing seems rather
redundant, further, if the file is sniffed and the format discovered, it
doesn't seem important which dialect it matches, as long as we are able
to use the sniffed parameters to parse it.
--
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308 (800) 735-0555 x308
More information about the Csv
mailing list