[Csv] csv.utils.Sniffer notes

Skip Montanaro skip at pobox.com
Sun Apr 13 03:46:35 CEST 2003


I guess this is mostly for Cliff, but everyone should feel free to chime in.
I went to write a subsection describing the utils.Sniffer class and began to
wonder about a few things.

  * It's not clear to me that passing a file object to Sniffer.sniff() is
    the correct way to give it data to operate on.  First, because you can
    perform multiple operations (sniff, hasHeaders), it requires the file
    object to be rewindable.  Second, it doesn't seem to me that setting
    self.fileobj in sniff() is the right thing.  What if all the user is
    interested in is whether the CSV file has headers?  I think it makes
    more sense to simply pass in a chunk of data to the constructor to use
    as the sample.  The caller can then worry about rewindability in his own
    code.

  * The mixture of camelCase and underscore separators in the method names.
    I believe it's more usual (especially in the Python core) to use an
    underscore to separate words in attribute names.

  * The use of eval().  I think the only things we can reasonably have in
    CSV files are strings, ints and floats, so code to determine types can
    look like:

        try:
            thisType = type(int(row[col]))
        except ValueError:
            try:
                thisType = type(float(row[col]))
            except ValueError:
                thisType = str

    OverFlowError doesn't need to be considered in 2.3 because int()
    silently coerces to longs:

        >>> int(6e23)
        600000000000000016777216L

    2.2 and earlier probably still require the OverflowError check.

  * I don't think the sniffer needs to offer a register_dialect() method.
    The sniff() method returns a dialect.  The programmer can then call the
    normal dialect registration function if need be.

Attached is an untested version of sniffer.py which implements the various
changes except for the eval() stuff.  The logic there was complex enough
that I didn't want to risk screwing it up.

Skip

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sniffer.diff
Type: application/octet-stream
Size: 8725 bytes
Desc: sniffer diff
Url : http://mail.python.org/pipermail/csv/attachments/20030412/63b3b2da/attachment.obj 


More information about the Csv mailing list