[Csv] What's our status?

Cliff Wells LogiplexSoftware at earthlink.net
Fri Feb 28 00:20:17 CET 2003


On Thu, 2003-02-27 at 10:00, Skip Montanaro wrote:
>     Cliff> Sounds like a possibility.  But what about:
> 
>     Cliff> $1,234;Wells,Cliff
> 
>     Cliff> where ; is the delimiter?
> 
> Oh, I'm sure we can always construct perfectly reasonable (that is, not "red
> team") examples where any of these heuristics fail.  That's why it's best to
> use the sniffers as hints, not the word of God.

Agreed.  But I'd still like to think of some clever way of resolving the
above.

> How about returning a list of candidate delimiters, ordered from most likely
> to least likely?  How about counting the number of cells generated using
> different candidate delimiters and returning the candidate which creates the
> most cells or average row lengths with the smallest standard deviation?  How

This is basically what it does now.  Except for the most cells bit,
which I consider too unreliable.  As long as the number of cells is
supposed to be fairly consistent, it should work.

> about allowing the user to specify a sample cell value which occurs in the
> data (e.g., sample="benzene" in Andrew's example, which allows you to easily
> identify SPC as the delimiter)?

Returning a list is a possibility.  I considered it when developing DSV
but couldn't think of a good use for it since the user was going to
confirm the selections anyway via the dialog.

> I've never seen any spreadsheet-like application guess the delimiter without
> some user input.  Importing CSV files in Gnumeric is rather fun.  You select
> the delimiters and watch it split the input on-the-fly.  It's cool to see it
> go from one jumbled column of data to a nicely aligned spreadsheet.

Hmph.  And DSV gets no credit for doing the same? <wink>  Actually,
Excel (and DSV) make a pretty good stab at the delimiter and then let
you modify their guesses via a preview dialog.  That's pretty much how I
always intended the sniffer to be used, so I suppose maybe I shouldn't
worry about it too much. Can't seem to help it though ;)


BTW, as far as making utils a sub-package of csv, do you intend this:

csv.utils (contains all utils in csv/utils.py)

or do you mean:

csv.utils.sniffer (csv/utils/sniffer.py, etc)

I personally prefer the latter as I can see utils encompassing a lot of
stuff, perhaps not all of it directly related and a utils.py file would
become rather large.  However, my packaging skills aren't the greatest,
so I'm a bit confused as to what __init__.py should contain so that we
aren't required to type "from csv import csv" instead of just "import
csv"



-- 
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308  (800) 735-0555 x308



More information about the Csv mailing list