[Csv] csv.utils.Sniffer notes
Cliff Wells
LogiplexSoftware at earthlink.net
Sat Apr 26 00:03:18 CEST 2003
On Thu, 2003-04-24 at 14:13, Skip Montanaro wrote:
> Sorry for the late notice on this. The 2.3b1 release snuck up on me.
>
> I sent this back on the 12th. It's in my outgoing mail archive, but I
> didn't see it in the mailing list archives and never received any
> responses. Maybe my mailman installation is broken. The last message
> archived appears on the 11th.
>
> Note also that I just checked in a change recommended by the PythonLabs
> folks - it's once again a csv module (no longer a package). Cliff's sniffer
> class is now csv.Sniffer. 2.3b1 is scheduled to be frozen tomorrow at noon.
> After that, the API can't change. If I don't hear from anyone about this
> real soon I'll go ahead and implement the change.
>
> Skip
>
> ---------------------------------------------------------------------- I
> guess this is mostly for Cliff, but everyone should feel free to chime in.
> I went to write a subsection describing the Sniffer class and began to
> wonder about a few things.
Sorry I've been out of action. We moved our office and I've been
offline for a few days. Oddly, I had the LAN installed at the new
location two days ago, everything plugged in and ready to go, but didn't
get AC power until about an hour ago =)
> * It's not clear to me that passing a file object to Sniffer.sniff() is
> the correct way to give it data to operate on. First, because you can
> perform multiple operations (sniff, hasHeaders), it requires the file
> object to be rewindable. Second, it doesn't seem to me that setting
> self.fileobj in sniff() is the right thing. What if all the user is
> interested in is whether the CSV file has headers? I think it makes
> more sense to simply pass in a chunk of data to the constructor to use
> as the sample. The caller can then worry about rewindability in his own
> code.
I've been thinking the same thing myself. Rewindability is an issue.
Originally DSV just used a chunk of data, so switching back to that
shouldn't be a problem.
> * The mixture of camelCase and underscore separators in the method names.
> I believe it's more usual (especially in the Python core) to use an
> underscore to separate words in attribute names.
>
> * The use of eval(). I think the only things we can reasonably have in
> CSV files are strings, ints and floats, so code to determine types can
> look like:
>
> try:
> thisType = type(int(row[col]))
> except ValueError:
> try:
> thisType = type(float(row[col]))
> except ValueError:
> thisType = str
Seems reasonable.
> OverFlowError doesn't need to be considered in 2.3 because int()
> silently coerces to longs:
>
> >>> int(6e23)
> 600000000000000016777216L
>
> 2.2 and earlier probably still require the OverflowError check.
>
> * I don't think the sniffer needs to offer a register_dialect() method.
> The sniff() method returns a dialect. The programmer can then call the
> normal dialect registration function if need be.
Okay.
> Attached is a context diff against the current CSV version of Lib/csv.py and
> Lib/test/test_csv.py which implements the various changes except for the
> eval() stuff and adds a couple simple sniffer tests. The logic for the
> eval() stuff was complex enough that I didn't want to risk screwing it up at
> this point.
You're saying my code isn't beautiful and easy to follow? <wink>
--
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308 (800) 735-0555 x308
More information about the Csv
mailing list