DSVWizard.py

Cliff Wells LogiplexSoftware at earthlink.net
Mon Jan 27 18:02:04 CET 2003


On Sun, 2003-01-26 at 21:18, Dave Cole wrote:
> > I'm adding Dave Cole to the distribution list on this note.  Dave,
> > Kevin Altis, Cliff Wells (author of DSV) and I have exchanged a few
> > messages about trying to develop a CSV API for Python.
> 
> Python having a CSV API would be an excellent thing.  The most
> difficult problem to solve is how to expose all of the CSV variations
> so that users can work out how to drive the module.
> 
> I suppose the first step would be to catalogue all of common the CSV
> variations and give them names.  Naming variations after the
> applications which produce them could be the best way.

That doesn't sound like a bad idea, but the task of cataloging all those
applications seems a bit daunting, especially since I suspect between
all of us we can probably only account for a handful of them.  I suppose
we could have a place for users to submit csv samples from applications
they want supported.  The fact of the matter is, despite there being no
real standard, there seems to be only minor differences between each
format: delimiter, quote style, allowed spaces around quotes.  A
programmer who knows the specific style of the data he's importing could
specify via attributes or flags how to process the file.  For the
general case, DSV already has heuristics for determining the first two,
and adding code to test for the third case shouldn't be too difficult. 
Another problem with specifying styles by application name is that many
apps allow the user to specify portions of the style (usually the
delimiter), so that's not set in stone either.

I think what I'm leaning towards at this time, if everyone is in
agreement, is for Dave or myself to reimplement Dave's code (and API) in
Python so that there is a pure Python implementation, and then provide
Dave's C module as a faster alternative (much like Pickle and cPickle). 
The heuristics of DSV would be an optional feature, along with the GUI. 
Someone is already doing work on porting the wxPython GUI code to Qt,
but it would be useful for a Tk port to appear as well (I'm *not*
volunteering for that).  I also have serious doubts about the GUI
getting added to the core (even a Tk version), so that would have to be
spun off and maintained separately on SF.  I also expect that if a csv
module were added to the Python library, I could get Robin Dunn to add
the GUI for it to the wxPython libraries.

As far as DSV's current API, I'm not too attached to it, and I think
that it could be mimicked sufficiently by adding a parser.parseall()
method to Dave's API so the programmer would have the option of getting
the entire file as a list without having to write a loop.

Something I'd also like to see, and I think Kevin mentioned this, is a
generator interface for retrieving the data line by line.

I think that this would provide the most complete set of features and
best performance options.

-- 
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308  (800) 735-0555 x308




More information about the Csv mailing list