[Csv] Module question...
altis at semi-retired.com
Thu Jan 30 07:55:03 CET 2003
> From: Andrew McNamara
> The way we've speced it, the module only deals with file objects. I
> wonder if there's any need to deal with strings, rather than files?
A string can be wrapped as StringIO to appear as a file and there may also
be other file-like objects that people might want to pass in.
> What was the rational for using files, rather making the user do their
> own readline(), etc?
I'll try and summarize, if this is too simplistic or incorrect I'm sure
someone will speak up :)
The simplest solution might have been to provide a file path and then let
the parser handle all the opening, reading, and closing, returning a result
list. However, that is far too limiting since then if you do want to parse a
string or something that isn't a physical file on disk you have to collect
the raw data, write it to a temp file and then pass the path of the temp
file in. Definitely, too cumbersome.
It would be possible to require the user code to supply one large string to
parse, thus putting the burden of opening, reading, and closing the
file-like object. This wastes memory, which can be a problem especially for
large data files.
One other possibility would be for the parser to only deal with one row at a
time, leaving it up to the user code to feed the parser the row strings. But
given the various possible line endings for a row of data and the fact that
a column of a row may contain a line ending, not to mention all the other
escape character issues we've discussed, this would be error-prone.
The solution was to simply accept a file-like object and let the parser do
the interpretation of a record. By having the parser present an iterable
interface, the user code still gets the convenience of processing per row if
needed or if no processing is desired a result list can easily be obtained.
This should provide the most flexibility while still being easy to use.
More information about the Csv