[Csv] CSV interface question
Dave Cole
djc at object-craft.com.au
Thu Jan 30 00:15:23 CET 2003
Cliff> You've lost me, I'm afraid. What I'm saying is that:
Cliff> csvreader = reader(file("test_data/sfsample.csv", 'r'),
Cliff> dialect='excel')
Cliff> isn't as flexible as
Cliff> csvreader = reader(file("test_data/sfsample.csv", 'r'),
Cliff> dialect=excel)
Cliff> where excel is either a pre-defined dictionary/class or a
Cliff> user-created dictionary/class.
Skip> Yes, but my string just indexes into a mapping to get to the
Skip> real dict which stores the parameter settings, as I indicated in
Skip> an earlier post:
Skip>
Skip> I was thinking of dialects as dicts. You'd have
Skip>
Skip> excel_dialect = { "quotechar": '"',
Skip> "delimiter": ',',
Skip> "linetermintor": '\r\n',
Skip> ...
Skip> }
Note the spelling error in "linetermintor" - user constructed
dictionaries are not good.
Whenever I find myself using dictionaries for storing values as
opposed to indexing data I can't escape the feeling that my past as a
Perl programmer is coming back to haunt me. At least with Perl there
is some syntactic sugar to make this type of thing less ugly:
excel_dialect = { quotechar => '"',
delimiter => ',',
linetermintor => '\r\n' }
In the absence of that sugar I would prefer something like the
following:
class excel:
quotechar = '"'
delimiter = ','
linetermintor = '\r\n'
settings = {}
for dialect in (excel, exceltsv):
settings[dialect.__name__] = dialect
Maybe we could include a name attribute which allowed us to use
'excel-tsv' as a dialect identifier.
Skip> with a corresponding mapping as you suggested:
Skip>
Skip> settings = { 'excel': excel_dialect,
Skip> 'excel-tsv: excel_tabs_dialect, }
Skip>
Skip> then in the factory functions do something like:
Skip>
Skip> def reader(fileobj, dialect="excel", **kwds):
Skip> kwargs = copy.copy(settings[dialect])
Skip> kwargs.update(kwds)
Skip> # possible sanity check on kwargs here ...
Skip> return _csv.reader(fileobj, **kwargs)
With the class technique this would become:
def reader(fileobj, dialect=excel, **kwds):
kwargs = {}
for key, value in dialect.__dict__.iteritems():
if not key.startswith('_'):
kwargs[key] = value
kwargs.update(kwds)
return _csv.reader(fileobj, **kwargs)
Skip> Did that not make it out? I also think it's cleaner if we have
Skip> a data file which is loaded at import time to define the various
Skip> dialects. That way we aren't mixing too much data into our
Skip> code. It also opens up the opportunity for users to later
Skip> specify their own dialect data files. Where I indicated
Skip> "possible sanity check" above would be a call to a validation
Skip> function on the settings.
Hmmm... Hard and messy to define classes on the fly. Then we are
back to some kind of dialect object.
class dialect:
def __init__(self, quotechar='"', delimiter=',', lineterminator='\r\n'):
self.quotechar = quotechar
self.delimiter = delimiter
self.lineterminator = lineterminator
settings = { 'excel': dialect(),
'excel-tsv': dialect(delimiter='\t') }
def add_dialect(name, dialect):
settings[name] = dialect
def reader(fileobj, args='excel', **kwds):
kwargs = {}
if not isinstance(args, dialect):
dialect = settings[args]
kwargs.update(name.__dict__)
kwargs.update(kwds)
return _csv.reader(fileobj, **kwargs)
This would then allow you to extend the settings dictionary on the
fly, or simply pass your own dialect object.
>>> import csv
>>> my_dialect = csv.dialect(lineterminator = '\f')
>>> rdr = csv.reader(file('blah.csv'), my_dialect)
- Dave
--
http://www.object-craft.com.au
_______________________________________________
Csv mailing list
Csv at mail.mojam.com
http://manatee.mojam.com/mailman/listinfo/csv
More information about the Csv
mailing list