[PEP305] Python 2.3: a small change request in CSV module

Andrew Dalke adalke at mindspring.com
Thu May 15 20:34:07 EDT 2003


Skip Montanaro
> I recognize the issue you raise.  As originally written, the Sniffer class
> also took a file-like object, however, it relied on being able to rewind
the
> stream.  This would, for example, prevent you from feeding sys.stdin to
the
> sniffer.  I also felt the decision of rewinding the stream belonged with
the
> caller.  I decided to change it to accepting a small data sample instead.
> You can avoid multiple opens by rewinding the stream yourself (in the
common
> case where the stream can be rewound):

BTW, I had a similar sniffing problem so I wrote the 'ReseekFile' module at
  http://dalkescientific.com/Python/#reseekfile
This provides a wrapper to a file handle, including a network socket which
cannot be reopened, and lets you reseek to the beginning, but only to the
beginning.

     infile =
ReseekFile.prepare_input_source(http://some-where/over/the/rainbow/)
     sample = infile.read(8192)
     infile.seek(0)
     infile.nobuffer()
     dialect = csv.Sniffer().sniff( sample )
     for fields in csv.reader( infile, dialect ):
         # do something with fields

The 'nobuffer()' is used to prevent internal buffering.  It's essential
because
at some point you'll need to read multiple megabytes of data without storing
it.

OTOH, I somewhat disagree with Skip.  My decision was that my
sniffer interface required passing in a file object which supports tell()
and seek() and where the seek would only be called with the value
of tell when it's passed to the sniffer.  If the file object doesn't support
it, use a ReseekFile.  In that case, the caller's the one in charge of
calling 'nobuffer()'.

                    Andrew
                    dalke at dalkescientific.com






More information about the Python-list mailing list