[Csv] CSV interface question

Wed Jan 29 20:18:16 CET 2003

On Wed, 2003-01-29 at 10:17, Skip Montanaro wrote:
>     Cliff> You've lost me, I'm afraid.  What I'm saying is that:
> 
>     Cliff> csvreader = reader(file("test_data/sfsample.csv", 'r'),
>     Cliff>                    dialect='excel')
> 
>     Cliff> isn't as flexible as
> 
>     Cliff> csvreader = reader(file("test_data/sfsample.csv", 'r'),
>     Cliff>                    dialect=excel)
> 
>     Cliff> where excel is either a pre-defined dictionary/class or a
>     Cliff> user-created dictionary/class.
> 
> Yes, but my string just indexes into a mapping to get to the real dict which
> stores the parameter settings, as I indicated in an earlier post:
> 
>     I was thinking of dialects as dicts.  You'd have
> 
        excel_dialect = { "quotechar": '"',
>                           "delimiter": ',',
>                           "linetermintor": '\r\n',
>                           ...
>                         }
> 
>     with a corresponding mapping as you suggested:
> 
>         settings = { 'excel': excel_dialect,
>                      'excel-tsv: excel_tabs_dialect, }
> 
>     then in the factory functions do something like:
> 
>         def reader(fileobj, dialect="excel", **kwds):
>             kwargs = copy.copy(settings[dialect])
>             kwargs.update(kwds)
>             # possible sanity check on kwargs here ...
>             return _csv.reader(fileobj, **kwargs)

I understand this, but I think you miss my point (or I missed you with
it ;)  Consider now the programmer actually defining a new dialect:
Passing a class or other structure (a dict is fine), they can create
this on the fly with minimal work.  Using a *string*, they must first
"register" that string somewhere (probably in the mapping we agree upon)
before they can actually make the function call.  Granted, it's only a
an extra step, but it requires a bit more knowledge (of the mapping) and
doesn't seem to provide a real benefit.  If you prefer a mapping to a
class, that is fine, but lets pass the mapping rather than a string
referring to it:

       excel_dialect = { "quotechar": '"',
                         "delimiter": ',',
                         "linetermintor": '\r\n',
                         ...
                        }

       settings = { 'excel': excel,
                    'excel-tsv: excel_tabs, }

       def reader(fileobj, dialect=excel, **kwds):
           kwargs = copy.copy(dialect)
           kwargs.update(kwds)
           # possible sanity check on kwargs here ...
           return _csv.reader(fileobj, **kwargs)

This allows the user to do such things as:

mydialect = { ... }
reader(fileobj, mydialect, ...)

rather than

mydialect = { ... }
settings['mydialect'] = mydialect
reader(fileobj, 'mydialect', ...)

To use the settings table for getting a default, they can still use
reader(fileobj, settings['excel-tsv'], ...)

or just use the excel settings directly:
reader(fileobj, excel_tsv, ...)

(BTW, I prefer 'dialects' to 'settings' for the mapping name, just for consistency).

I'll grant that the difference is small, but it still requires one extra
line and one extra piece of knowledge with no real benefit to the
programmer, AFAICT.  If you don't agree I'll let it pass as it *is* a
relatively minor difference.

> Did that not make it out?  I also think it's cleaner if we have a data file
> which is loaded at import time to define the various dialects.  That way we
> aren't mixing too much data into our code.  It also opens up the opportunity
> for users to later specify their own dialect data files.  Where I indicated
> "possible sanity check" above would be a call to a validation function on
> the settings.

+1 on this, but only if you cave on the other one <wink>

-- 
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308  (800) 735-0555 x308

_______________________________________________
Csv mailing list
Csv at mail.mojam.com
http://manatee.mojam.com/mailman/listinfo/csv