[Csv] Re: [PEP305] Python 2.3: a small change request in CSV module

Bernard Delmée bdelmee at advalvas.REMOVEME.be
Sat May 17 15:20:57 EDT 2003


> With the following (completely untested) you could sniff and read an
> input source while only reading it once.
>
>     class SniffedInput:
>         [implementation omitted]
>
> Does the above satisfy your needs?

It does, thanks Dave (give or take a few typos trivial to fix).
So I now have three working solutions:

(1) let sniffer detect dialect, reset input then iterate
(2) essentially as (1), except wrapped in a generator
(3) your iterator-based suggestion (SniffedInput); with the
    advantage of not requiring a seek on the file-like data source

I tested them against a file holding 115.000 lines of 56 fields, and
the respective runtimes are: (1) 5.5s (2) 6.5s (3) 6.9s

I think 2&3 add overhead to every readline(), if only an extra
python function call (iterator/generator), and these accumulate to
a perceptible -albeit little- slowdown.

> Should something like that be placed into the csv module?

I dunno, really. Given the above results, the overhead would probably
only go away if this was supported by the C reader() code, with usage
close to my original suggestion. That's probably too much to ask,
certainly if I've been the sole user to ask for it.

now there's something else Skip got me thinking about (maybe this
should be a separate post). He rightly underlined that there's no
guarantee that the sniffer will guess right. For example if most of
your fields are "dd/mm/yy" dates, the sniffer may decide (untried)
that '/' is the most likely delimiter. Hence let me re-iterate my
suggestion to tip the sniffer off by adding a second argument to
Sniffer().sniff(), an optional string holding the allowed or expected
delimiters. Short of direct support for mutiple separators, which
may be too rarely needed to move to the C implementation, it would
be very useful to have a means to assist the sniffer in guessing right.

Thanks for your attention,

Bernard.

PS: do I have to subscrive somewhere to follow csv at mail.mojam.com ?







More information about the Python-list mailing list