[Python-Dev] [Csv] skipfinalspace
John Machin
sjmachin at lexicon.net
Mon Oct 20 09:48:10 CEST 2008
Tom Brown wrote:
> (Continuing thread started at
> http://mail.python.org/pipermail/csv/2008-October/000688.html)
>
> On Sun, Oct 19, 2008 at 16:46, Andrew McNamara
> <andrewm at object-craft.com.au <mailto:andrewm at object-craft.com.au>> wrote:
>
> >I downloaded the 2.6 source tar ball, but is it too late for new
> features to
> >get into versions <3?
>
> Yep.
>
> >How would you feel about adding the following tests to
> Lib/test/test_csv.py
> >and getting them to pass?
> >
> >Also http://www.python.org/doc/2.5.2/lib/csv-fmt-params.html says
> >"*skipinitialspace *When True, whitespace immediately following the
> >delimiter is ignored."
> >but my tests show whitespace at the start of any field is ignored,
> including
> >the first field.
>
> I suspect (but I haven't checked) that it means "after the delimiter and
> before any quoted field (or some variation on that).
>
> I agree that whitespace after the delimiter and before any quoted field
> is skipped. Also whitespace after the start of the line and before any
> quoted field is skipped.
> All of the "dialect" parameters are there to allow parsing of a specific
> common form of CSV file. Because there is no formal definition of the
> format, the module simply aims to parse (and produce the same result)
> as common applications such as Excel and Access. Changing the behaviour
> in any non-backwards compatible way is sure to get screams of anguish
> from many users. Even when the behaviour appears to be a bug, you can
> be sure people are counting on it working like that.
>
>
> skipinitialspace defaults to false and by the same logic skipfinalspace
> should default to false to preserve compatibility with the csv module in
> 2.6. On the other hand, the switch to version 3 is as good a time as any
> to break backwards compatibility to adopt something that works better
> for new users.
Read Andrew's lips: They don't want "better", they want "the same as MS".
> Based on my experience parsing several hundred csv generated by many
> different people I think it would be nice to at least have a dialect
> that is excel + skipinitialspace=True + skipfinalspace=True.
Based on my experience extracting data from innumerable csv files (and
infinite varieties thereof), spreadsheet files, and database tables, in
99.99% of cases one should automatically apply the following
transformations to each text field:
* strip leading whitespace
* strip trailing whitespace
* replace embedded runs of whitespace by a single space
and one needs to ensure that the definition of whitespace includes the
no-break space (NBSP) character.
As this "space normalisation" is needed for all input sources, the csv
module is IMHO the wrong place to put it. A string method would be a
better idea.
Cheers,
John
More information about the Python-Dev
mailing list