[Csv] csv.utils.Sniffer notes

Cliff Wells LogiplexSoftware at earthlink.net
Sat Apr 26 00:03:18 CEST 2003


On Thu, 2003-04-24 at 14:13, Skip Montanaro wrote:
> Sorry for the late notice on this.  The 2.3b1 release snuck up on me.
> 
> I sent this back on the 12th.  It's in my outgoing mail archive, but I
> didn't see it in the mailing list archives and never received any
> responses.  Maybe my mailman installation is broken.  The last message
> archived appears on the 11th.
> 
> Note also that I just checked in a change recommended by the PythonLabs
> folks - it's once again a csv module (no longer a package).  Cliff's sniffer
> class is now csv.Sniffer.  2.3b1 is scheduled to be frozen tomorrow at noon.
> After that, the API can't change.  If I don't hear from anyone about this
> real soon I'll go ahead and implement the change.
> 
> Skip
> 
> ---------------------------------------------------------------------- I
> guess this is mostly for Cliff, but everyone should feel free to chime in.
> I went to write a subsection describing the Sniffer class and began to
> wonder about a few things.

Sorry I've been out of action.  We moved our office and I've been
offline for a few days.  Oddly, I had the LAN installed at the new
location two days ago, everything plugged in and ready to go, but didn't
get AC power until about an hour ago =)

>   * It's not clear to me that passing a file object to Sniffer.sniff() is
>     the correct way to give it data to operate on.  First, because you can
>     perform multiple operations (sniff, hasHeaders), it requires the file
>     object to be rewindable.  Second, it doesn't seem to me that setting
>     self.fileobj in sniff() is the right thing.  What if all the user is
>     interested in is whether the CSV file has headers?  I think it makes
>     more sense to simply pass in a chunk of data to the constructor to use
>     as the sample.  The caller can then worry about rewindability in his own
>     code.

I've been thinking the same thing myself.  Rewindability is an issue. 
Originally DSV just used a chunk of data, so switching back to that
shouldn't be a problem.

>   * The mixture of camelCase and underscore separators in the method names.
>     I believe it's more usual (especially in the Python core) to use an
>     underscore to separate words in attribute names.
> 
>   * The use of eval().  I think the only things we can reasonably have in
>     CSV files are strings, ints and floats, so code to determine types can
>     look like:
> 
>         try:
>             thisType = type(int(row[col]))
>         except ValueError:
>             try:
>                 thisType = type(float(row[col]))
>             except ValueError:
>                 thisType = str

Seems reasonable.  

>     OverFlowError doesn't need to be considered in 2.3 because int()
>     silently coerces to longs:
> 
>         >>> int(6e23)
>         600000000000000016777216L
> 
>     2.2 and earlier probably still require the OverflowError check.
> 
>   * I don't think the sniffer needs to offer a register_dialect() method.
>     The sniff() method returns a dialect.  The programmer can then call the
>     normal dialect registration function if need be.

Okay.

> Attached is a context diff against the current CSV version of Lib/csv.py and
> Lib/test/test_csv.py which implements the various changes except for the
> eval() stuff and adds a couple simple sniffer tests.  The logic for the
> eval() stuff was complex enough that I didn't want to risk screwing it up at
> this point.

You're saying my code isn't beautiful and easy to follow? <wink>

-- 
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308  (800) 735-0555 x308



More information about the Csv mailing list