[Csv] Module question...

Thu Jan 30 08:33:52 CET 2003

>> The way we've speced it, the module only deals with file objects. I
>> wonder if there's any need to deal with strings, rather than files?

BTW, I'm asking this because it's something that will come back to haunt
us if we get it wrong - it's something we need to make the right call on.

>A string can be wrapped as StringIO to appear as a file and there may also
>be other file-like objects that people might want to pass in.

Yes - if the most common use by far is reading and writing files, then
this is the right answer (i.e., say "use StringIO if you really need to
do a string").

>> What was the rational for using files, rather making the user do their
>> own readline(), etc?
>
>I'll try and summarize, if this is too simplistic or incorrect I'm sure
>someone will speak up :)
>
>The simplest solution might have been to provide a file path and then let
>the parser handle all the opening, reading, and closing, returning a result
>list. However, that is far too limiting since then if you do want to parse a
>string or something that isn't a physical file on disk you have to collect
>the raw data, write it to a temp file and then pass the path of the temp
>file in. Definitely, too cumbersome.

Yeah - I'm certainly not suggesting that.

>It would be possible to require the user code to supply one large string to
>parse, thus putting the burden of opening, reading, and closing the
>file-like object. This wastes memory, which can be a problem especially for
>large data files.

Agreed.

>One other possibility would be for the parser to only deal with one row at a
>time, leaving it up to the user code to feed the parser the row strings. But
>given the various possible line endings for a row of data and the fact that
>a column of a row may contain a line ending, not to mention all the other
>escape character issues we've discussed, this would be error-prone.

This is the way the Object Craft module has worked - it works well enough,
and the universal end-of-line stuff in 2.3 makes it more seamless. Not
saying I'm wedded to this scheme, but I'd just like to have clear why
we've chosen one over the other.

I'm trying to think of an example where operating on a file-like object
would be too restricting, and I can't - oh, here's one: what if you
wanted to do some pre-processing on the data (say it was uuencoded)?

>The solution was to simply accept a file-like object and let the parser do
>the interpretation of a record. By having the parser present an iterable
>interface, the user code still gets the convenience of processing per row if
>needed or if no processing is desired a result list can easily be obtained.
>
>This should provide the most flexibility while still being easy to use.

Should the object just be defined as an iteratable, and leave closing,
etc, up to the user of the module? One downside of this is you can't
rewind an iterator, so things like the sniffer would be SOL. We can't
ensure that the passed file is rewindable either. Hmmm.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/