[Csv] Status
Cliff Wells
LogiplexSoftware at earthlink.net
Thu Jan 30 18:57:45 CET 2003
On Wed, 2003-01-29 at 18:48, Skip Montanaro wrote:
> It would appear we are converging on dialects as data-only classes
> (subclassable but with no methods). I'll update the PEP. Many other ideas
> have been floating through the list, and while I haven't been deleting the
> messages, I haven't been adding them to the PEP either. Can someone help
> with that?
A comment on the dialect classes: I think a validate() method would be
good in the base dialect class. A separate validate function would do
just as well, but it seems logical to make it part of the class.
> I'd like to get the wording in the PEP to converge on our current thoughts
> and announce it on c.l.py and python-dev sometime tomorrow. I think we will
> get a lot of feedback from both camps, hopefully some of it useful. ;-)
Undoubtedly Timothy Rue will inform us that we are wasting our time as
the VIC will solve this problem as well (after all, input->9
commands->output), but if you think you can live with that, sure.
> I just finished making a pass through the messages I hadn't deleted (and
> then saved them to a csv mbox file since the list appears to still not be
> archiving). Here's what I think we've concluded:
>
> * Dialects are a set of defaults, probably implemented as classes (which
> allows subclassing, whereas dicts wouldn') and the default dialect
> named as something like csv.dialects.excel or "excel" if we allow
> string specifiers. (I think strings work well at the API, simply
> because they are shorter and can more easily be presented in GUI
> tools.)
Agreed. Just to clarify, these strings will still be stored in a
dictionary ("settings" or "dialects")?
> * A csvutils module should be at least scoped out which might do a fair
> number of things:
>
> - Implements one or more sniffers for parameter types
>
> - Validates CSV files (e.g., constant number of columns, type
> constraints on column values, compares against given dialect)
>
> - Generate a sniffer from a CSV file
>
> * These individual parameters are necessary (hopefully the names will be
> enough clue as to there meaning): quote_char, quoting ("auto",
> "always", "nonnumeric", "never"), delimiter, line_terminator,
> skip_whitespace, escape_char, hard_return. Are there others?
>
> * We're still undecided about None (I certainly don't think it's a valid
> value to be writing to CSV files)
IMO, None should be mapped to '', so [None, None, None] would be saved
as ,, or "","","" if quoting="always". I can't think of any reasonable
alternative. However, it is arguable whether reading ,, should return
[None,None,None] or ['','','']. I'd vote for the latter since we
explicitly are not doing conversions between strings and Python types
('6' doesn't become 6).
> * Rows can have variable numbers of columns and the application is
> responsible for deciding on and enforcing max_rows or max_cols.
>
> * Don't raise exceptions needlessly. For example, specifying
> quoting="never" and not specifying a value for escape_char would be
> okay until you encounter a field when writing which contains the
> delimiter.
>
> * Files have to be opened in binary mode (we can check the mode
> attribute I believe) so we can do the right thing with line
> terminators.
>
> * Data values should always be returned as strings, even if they are
> valid numbers. Let the application do data conversion.
>
> Other stuff we haven't talked about much:
>
> * Unicode. I think we punt on this for now and just pretend that
> passing codecs.open(csvfile, mode, encoding) is sufficient. I'm sure
> Martin von Löwis will let us know if it isn't. ;-) Dave said, "The low
> level parser (C code) is probably going to need to handle unicode."
> Let's wait and see how well codecs.open() works for us.
>
> * We know we need tests but haven't talked much about them. I vote for
> PyUnit as much as possible, though a certain amount of manual testing
> using existing spreadsheets and databases will be required.
+1. Testing all the corner cases is going to take some care.
> * Exceptions. We know we need some. We should start with CSVError and
> try to avoid getting carried away with things. If need be, we can add
> a code field to the class. I don't like the idea of having 17
> different subclasses of CSVError though. It's too much complexity for
> most users.
I can only count to 12 (or was it 11?), so this would be good for me as
well.
--
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308 (800) 735-0555 x308
More information about the Csv
mailing list