[Python-3000] csv.Sniffer - delete in Python 3.0?
skip at pobox.com
skip at pobox.com
Wed Mar 19 16:44:05 CET 2008
The csv module contains a Sniffer class which is supposed to deduce the
delimiter and quote character as well as the presence or absence of a header
in a sample taken from the start of a purported CSV file. I no longer
remember who wrote it, and I've never been a big fan of it. It determines
the delimiter based almost solely on character frequencies. It doesn't
consider what the actual structure of a CSV file is or that delimiters and
quote characters are almost always taken from the set of punctuation or
whitespace characters. Consequently, it can cause some occasional
head-scratching:
>>> sample = """\
... abc8def
... def8ghi
... ghi8jkl
... """
>>> import csv
>>> d = csv.Sniffer().sniff(sample)
>>> d.delimiter
'8'
>>> sample = """\
... a8bcdef
... ab8cdef
... abc8def
... abcd8ef
... """
>>> d = csv.Sniffer().sniff(sample)
>>> d.delimiter
'f'
It's not clear to me that people use letters or digits very often as
delimiters. Both samples above probably represent data from single-column
files, not double-column files with '8' or 'f' as the delimiter.
I would be happy to get rid of it in 3.0, but I'm also aware that some
people use it. I'd like feedback from the Python community about this. If
I removed it is there someone out there who wants it badly enough to
maintain it in PyPI?
Thanks,
--
Skip Montanaro - skip at pobox.com - http://www.webfast.com/~skip/
More information about the Python-3000
mailing list