[Csv] Devil in the details, including the small one between delimiters and quotechars

Skip Montanaro skip at pobox.com
Thu Jan 30 15:40:31 CET 2003


    Dave> The current version of the _csv parser can do two things depending
    Dave> upon the value of the strict parameter.

    >>> p.strict  
    0
    >>> p.parse('1,"not quoted" ,"quoted"')
    ['1', 'not quoted ', 'quoted']

Hmmm...  I think this is wrong.  You treated " as the quote character but
tacked the space onto the field even though it occurred after the " which
should have terminated the field.  I would have expected:

    ['1', 'not quoted', 'quoted']

Barfing when p.strict == 1 seems correct to me.

    Skip> ['1', 'not quoted', 'quoted']

    Dave> Why wouldn't you include the trailing space on the second field?

Because the quoting tells you the field has ended.

    Dave> I think that there are enough variations here that strict is not
    Dave> enough.

I think that when strict == 0, extra whitespace between the terminating
quote and the delimiter or between the delimiter and the first quote should
be discarded.  If the field is not quoted, leading or trailing whitespace is
ignored.  I think that makes the treatment of whitespace near delimiters
uniform (principle of least surprise?).  If that's not what the user wants,
she can damn well set the strict flag to True and catch the exception. ;-)

(Speaking of exceptions, should there be a field in _csv.Error which holds
the raw text which causes the exception?)


    Skip> Depends on the setting of skipinitialspaces.  If false, you get
    Skip> ['quoted', ' "not quoted', ' but this ""field"" has delimiters and quotes"']

    Dave> parser does this:

    Dave> ['quoted', ' "not quoted', ' but this ""field"" has delimiters and quotes"']

    Skip> if True, I think you get

    Skip> ['quoted', 'not quoted, but this "field" has delimiters and quotes']

    Dave> Yeah, but the doublequote stuff is only meant for quoted fields
    Dave> (or is it).

Damn, yeah.  Maybe we have overspecified the parameter set.  Do we need both
strict and skipinitialspaces?  I'd say keep strict and dump
skipinitialspaces, then define fairly precisely what to do when
strict==False.

    Cliff> I propose space between delimiters and quotes raise an exception
    Cliff> and let's be done with it.  I don't think this really affects
    Cliff> Excel compatibility since Excel will never generate this type of
    Cliff> file and doesn't require it for import.  It's true that some
    Cliff> files that Excel would import (probably incorrectly) won't import
    Cliff> in CSV, but I think that's outside the scope of Excel
    Cliff> compatibility.

    Skip> Sounds good to me.

I can never remember my past train of thought from one day to the next. :-(

can-you-hear-me-waffling?-ly y'rs,

Skip


More information about the Csv mailing list