csv.Sniffer - delete in Python 3.0?

skip at pobox.com skip at pobox.com
Wed Mar 19 16:44:05 CET 2008


The csv module contains a Sniffer class which is supposed to deduce the
delimiter and quote character as well as the presence or absence of a header
in a sample taken from the start of a purported CSV file.  I no longer
remember who wrote it, and I've never been a big fan of it.  It determines
the delimiter based almost solely on character frequencies.  It doesn't
consider what the actual structure of a CSV file is or that delimiters and
quote characters are almost always taken from the set of punctuation or
whitespace characters.  Consequently, it can cause some occasional
head-scratching:

    >>> sample = """\
    ... abc8def
    ... def8ghi
    ... ghi8jkl
    ... """
    >>> import csv
    >>> d = csv.Sniffer().sniff(sample)
    >>> d.delimiter
    '8'
    >>> sample = """\
    ... a8bcdef
    ... ab8cdef
    ... abc8def
    ... abcd8ef
    ... """
    >>> d = csv.Sniffer().sniff(sample)
    >>> d.delimiter
    'f'

It's not clear to me that people use letters or digits very often as
delimiters.  Both samples above probably represent data from single-column
files, not double-column files with '8' or 'f' as the delimiter.

I would be happy to get rid of it in 3.0, but I'm also aware that some
people use it.  I'd like feedback from the Python community about this.  If
I removed it is there someone out there who wants it badly enough to
maintain it in PyPI?

Thanks,

-- 
Skip Montanaro - skip at pobox.com - http://www.webfast.com/~skip/



More information about the Python-list mailing list