[Python-Dev] [Csv] skipfinalspace

John Machin sjmachin at lexicon.net
Mon Oct 20 09:48:10 CEST 2008


Tom Brown wrote:
> (Continuing thread started at 
> http://mail.python.org/pipermail/csv/2008-October/000688.html)
> 
> On Sun, Oct 19, 2008 at 16:46, Andrew McNamara 
> <andrewm at object-craft.com.au <mailto:andrewm at object-craft.com.au>> wrote:
> 
>      >I downloaded the 2.6 source tar ball, but is it too late for new
>     features to
>      >get into versions <3?
> 
>     Yep.
> 
>      >How would you feel about adding the following tests to
>     Lib/test/test_csv.py
>      >and getting them to pass?
>      >
>      >Also http://www.python.org/doc/2.5.2/lib/csv-fmt-params.html says
>      >"*skipinitialspace *When True, whitespace immediately following the
>      >delimiter is ignored."
>      >but my tests show whitespace at the start of any field is ignored,
>     including
>      >the first field.
> 
>     I suspect (but I haven't checked) that it means "after the delimiter and
>     before any quoted field (or some variation on that).
> 
> I agree that whitespace after the delimiter and before any quoted field 
> is skipped. Also whitespace after the start of the line and before any 
> quoted field is skipped.

>     All of the "dialect" parameters are there to allow parsing of a specific
>     common form of CSV file. Because there is no formal definition of the
>     format, the module simply aims to parse (and produce the same result)
>     as common applications such as Excel and Access. Changing the behaviour
>     in any non-backwards compatible way is sure to get screams of anguish
>     from many users. Even when the behaviour appears to be a bug, you can
>     be sure people are counting on it working like that.
> 
> 
> skipinitialspace defaults to false and by the same logic skipfinalspace 
> should default to false to preserve compatibility with the csv module in 
> 2.6. On the other hand, the switch to version 3 is as good a time as any 
> to break backwards compatibility to adopt something that works better 
> for new users.

Read Andrew's lips: They don't want "better", they want "the same as MS".

> Based on my experience parsing several hundred csv generated by many 
> different people I think it would be nice to at least have a dialect 
> that is excel + skipinitialspace=True + skipfinalspace=True.

Based on my experience extracting data from innumerable csv files (and 
infinite varieties thereof), spreadsheet files, and database tables, in 
99.99% of cases one should automatically apply the following 
transformations to each text field:
    * strip leading whitespace
    * strip trailing whitespace
    * replace embedded runs of whitespace by a single space
and one needs to ensure that the definition of whitespace includes the 
no-break space (NBSP) character.

As this "space normalisation" is needed for all input sources, the csv 
module is IMHO the wrong place to put it. A string method would be a 
better idea.

Cheers,
John


More information about the Python-Dev mailing list