[Python-Dev] [Csv] skipfinalspace

Andrew McNamara andrewm at object-craft.com.au
Mon Oct 20 08:38:05 CEST 2008


>>>I downloaded the 2.6 source tar ball, but is it too late for new
>>>features to get into versions <3?
>>
>> Yep.

Sigh - I should slow down and actually read the e-mail I'm replying
to. It is not too late to get features into versions <3. It is, however,
too late to get features into 2.6, which was not what you asked, but
what I was answering "Yep" to.

>>>How would you feel about adding the following tests to
>>>Lib/test/test_csv.py and getting them to pass?

I have no real objection to someone adding a skipfinalspace parameter and
associated tests, although I have no time to do it myself at the moment.

>> >Also http://www.python.org/doc/2.5.2/lib/csv-fmt-params.html says
>> >"*skipinitialspace *When True, whitespace immediately following the
>> >delimiter is ignored."
>> >but my tests show whitespace at the start of any field is ignored,
>> >including the first field.
>>
>> I suspect (but I haven't checked) that it means "after the delimiter and
>> before any quoted field (or some variation on that).
>
>I agree that whitespace after the delimiter and before any quoted field is
>skipped. Also whitespace after the start of the line and before any quoted
>field is skipped.

I'm not sure if we're talking about the same thing - it seems to work as I
expect it to work:

    >>> list(csv.reader([' foo, bar']))
    [[' foo', ' bar']]
    >>> list(csv.reader([' foo, bar'], skipinitialspace=1))
    [['foo', 'bar']]

BTW, I think the reason "skipinitialspace" exists at all is to support
this:

    >>> list(csv.reader([' foo, " bar"']))
    [[' foo', ' " bar"']]
    >>> list(csv.reader([' foo, " bar"'], skipinitialspace=1))
    [['foo', ' bar']]

The quoting is only valid if the quote is the first character encountered
in the field (this is how Excel works). However, some other CSV generators
insert a space after the comma, and expect the parser to still treat it
as a quoted field - so skipinitialspace eats the space leading up the
quote, but does not eat any space after the quote (hence the "initial"
in the name).

For symmetry, a "skipfinalspace" option should do the same - only eat
space after the quote (if quotes are used) - however this will be rather
hard to implement as the parser state has already rolled on, and you
no longer know that whether the field was quoted. Eating spaces that
appeared within the quotes is the wrong thing to do.

>skipinitialspace defaults to false and by the same logic skipfinalspace
>should default to false to preserve compatibility with the csv module in
>2.6. On the other hand, the switch to version 3 is as good a time as any to
>break backwards compatibility to adopt something that works better for new
>users.

No, by default it needs to work like Excel, because this is the defacto
standard.

>Based on my experience parsing several hundred csv generated by many
>different people I think it would be nice to at least have a dialect that is
>excel + skipinitialspace=True + skipfinalspace=True.

Once the "skipfinalspace" parameter is implemented, there is nothing
stopping you creating such a dialect in your code, but I don't support
adding it to the standard library - the dialects in the std lib should
be well defined (in some way).

BTW, it's not necessary to create dialect objects: as I've done above,
users can pass keyword parameters to the parser if it's more convenient.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/


More information about the Python-Dev mailing list