[Csv] What's our status?
Cliff Wells
LogiplexSoftware at earthlink.net
Fri Feb 28 00:20:17 CET 2003
On Thu, 2003-02-27 at 10:00, Skip Montanaro wrote:
> Cliff> Sounds like a possibility. But what about:
>
> Cliff> $1,234;Wells,Cliff
>
> Cliff> where ; is the delimiter?
>
> Oh, I'm sure we can always construct perfectly reasonable (that is, not "red
> team") examples where any of these heuristics fail. That's why it's best to
> use the sniffers as hints, not the word of God.
Agreed. But I'd still like to think of some clever way of resolving the
above.
> How about returning a list of candidate delimiters, ordered from most likely
> to least likely? How about counting the number of cells generated using
> different candidate delimiters and returning the candidate which creates the
> most cells or average row lengths with the smallest standard deviation? How
This is basically what it does now. Except for the most cells bit,
which I consider too unreliable. As long as the number of cells is
supposed to be fairly consistent, it should work.
> about allowing the user to specify a sample cell value which occurs in the
> data (e.g., sample="benzene" in Andrew's example, which allows you to easily
> identify SPC as the delimiter)?
Returning a list is a possibility. I considered it when developing DSV
but couldn't think of a good use for it since the user was going to
confirm the selections anyway via the dialog.
> I've never seen any spreadsheet-like application guess the delimiter without
> some user input. Importing CSV files in Gnumeric is rather fun. You select
> the delimiters and watch it split the input on-the-fly. It's cool to see it
> go from one jumbled column of data to a nicely aligned spreadsheet.
Hmph. And DSV gets no credit for doing the same? <wink> Actually,
Excel (and DSV) make a pretty good stab at the delimiter and then let
you modify their guesses via a preview dialog. That's pretty much how I
always intended the sniffer to be used, so I suppose maybe I shouldn't
worry about it too much. Can't seem to help it though ;)
BTW, as far as making utils a sub-package of csv, do you intend this:
csv.utils (contains all utils in csv/utils.py)
or do you mean:
csv.utils.sniffer (csv/utils/sniffer.py, etc)
I personally prefer the latter as I can see utils encompassing a lot of
stuff, perhaps not all of it directly related and a utils.py file would
become rather large. However, my packaging skills aren't the greatest,
so I'm a bit confused as to what __init__.py should contain so that we
aren't required to type "from csv import csv" instead of just "import
csv"
--
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308 (800) 735-0555 x308
More information about the Csv
mailing list