[Python-Dev] [Csv] These csv test cases seem incorrect to me...

John Machin sjmachin at lexicon.net
Mon Mar 12 05:13:25 CET 2007


On 12/03/2007 1:41 PM, Andrew McNamara wrote:
> 
> The point was to produce the same results as Excel. Sure, Excel probably
> doesn't generate crap like this itself, but 3rd parties do, and people
> complain if we don't parse it just like Excel (sigh).

Let's put a little flesh on those a's and b's:

A typical example of the first case is where a database address line 
contains a quoted house name e.g.

"Dunromin", 123 Main Street

and the producer of the CSV file has not done any quoting at all.

An example of the 2nd case is a database address line like this:

C/o Mrs Jones, "Dunromin", 123 Main Street

and the producer of the CSV file has merely wrapped quotes about it 
without doubling the existing quotes, to emit this:

"C/o Mrs Jones, "Dunromin", 123 Main Street"

which Excel and adherents would distort to two fields containing:
'C/o Mrs Jones, Dunromin"' and ' 123 Main Street"' -- aarrgghh!!

People who complain as described are IMHO misguided; they are accepting 
crap and losing data (yes, the quotes in the above examples are *DATA*). 
Why should we heed their complaints?

Perhaps we could consider a non-default "dopey_like_Excel" option for 
csv :-)

BTW, it is possible to do a reasonable recovery job when the producer's 
protocol was to wrap quotes around the data without doubling existing 
quotes, providing there were an even number of quotes to start with. It 
just requires a quite different finite state machine.

Cheers,
John




More information about the Python-Dev mailing list