[Csv] What's our status?
Skip Montanaro
skip at pobox.com
Thu Feb 27 18:15:57 CET 2003
Cliff> data,2003/02/27,08:51:00
Cliff> data,2003/02/27,08:52:00
Cliff> data,2003/02/27,08:53:00
Cliff> data,2003/02/27,08:54:00
Cliff> In this case it is difficult to know whether ,/ or : is the
Cliff> delimiter. It's not entirely unreasonable to use a "preferred"
Cliff> list of delimiters but it's not entirely safe either ;) In fact,
Cliff> the current implementation will resort to a preferred list in
Cliff> this example and return , as the delimiter. However, given the
Cliff> following:
Cliff> 2003/02/27,08:51:00
Cliff> data,2003/02/27,08:52:00
Cliff> 08:53:00
Cliff> data,2003/02/27,08:54:00
Cliff> It would most likely (without testing) return ":" as the
Cliff> delimiter as it occurs equally consistently with "/", but is
Cliff> higher in the preferred list. This is wrong as the delimiter is
Cliff> clearly ",". That being said, I would simply consider this file
Cliff> as being unsniffable as it has no real pattern.
How about this. A candidate delimiter is preferred if two occurrences of it
enclose other candidate delimiters. Conversely, a candidate delimiter in which
two occurrences only surround alphanumeric characters is deemed "less
worthy".
Cliff> BTW, I'm +1 on Skip's suggestion to make the utils a package
Cliff> (cvs.utils) and will check it into CVS as such. Anyone object?
Nope, sorry I didn't get around to checking in the version you posted
yesterday.
Skip
More information about the Csv
mailing list