[ python-Bugs-1465014 ] CSV regression in 2.5a1: multi-line cells

Fri Jun 23 05:13:21 CEST 2006

Bugs item #1465014, was opened at 2006-04-05 11:14
Message generated for change (Comment added) made by goodger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1465014&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
>Status: Closed
>Resolution: Rejected
Priority: 9
Submitted By: David Goodger (goodger)
Assigned to: Andrew McNamara (andrewmcnamara)
Summary: CSV regression in 2.5a1: multi-line cells

Initial Comment:
Running the attached csv_test.py under Python 2.4.2
(Windows XP SP1) produces:

>c:\apps\python24\python.exe ./csv_test.py
['one', '2', 'three (line 1)\n(line 2)']

Note that the third item in the row contains a newline
between "(line 1)" and "(line 2)".

With Python 2.5a1, I get:

>c:\apps\python25\python.exe ./csv_test.py
['one', '2', 'three (line 1)(line 2)']

Notice the missing newline, which is significant.  The
CSV module under 2.5a1 seems to lose data.

----------------------------------------------------------------------

>Comment By: David Goodger (goodger)
Date: 2006-06-22 23:13

Message:
Logged In: YES 
user_id=7733

I didn't realize that the previous behavior was buggy; I
thought that the current behavior was a side-effect.  The
2.5 behavior did cause a small problem in Docutils, but it's
already been fixed.  I just wanted to ensure that no
regression was creeping in to 2.5.

Thanks for the explanation!  Perhaps it could be added to
the docs in some form?

Marking the bug report closed.

----------------------------------------------------------------------

Comment By: Andrew McNamara (andrewmcnamara)
Date: 2006-06-22 20:27

Message:
Logged In: YES 
user_id=698599

The previous behaviour caused considerable problems, 
particularly on platforms that did not use the unix line-
ending conventions, or with files that originated on those 
platforms - users were finding mysterious newlines where 
they didn't expect them.

Quoted fields exist to allow characters that would otherwise 
be considered part of the syntax to appear within the field. 
 So yes, quoted fields are a special case, and necessarily 
so.

The current behaviour puts the control back in the hands of 
the user of the module: if literal newlines are important 
within a field, they need to read their file in a way that 
preserves the newlines. The old behaviour would introduce 
spurious characters into quoted fields, with no way for the 
user to control that behaviour.

I'm sorry that the change causes you problems. With a format 
that's as loosely defined as CSV, it's an unfortunate fact 
of life that there are going to be conflicting requirements. 

----------------------------------------------------------------------

Comment By: David Goodger (goodger)
Date: 2006-06-22 14:17

Message:
Logged In: YES 
user_id=7733

I see what you're saying, but I disagree.  In Python 2.4,
csv.reader did not require newlines, but in Python 2.5 it
does.  That's a significant behavioral change.  In the
stdlib csv "Module Contents" docs for csv.reader, it says:
"csvfile can be any object which supports the iterator
protocol and returns a string each time its next method is
called."  It doesn't mention newline-terminated strings.

In any case, the behavior is inconsistent: newlines are not
required to terminate row-ending strings, but only strings
which end inside cells split across rows.  Why the discrepancy?

----------------------------------------------------------------------

Comment By: Andrew McNamara (andrewmcnamara)
Date: 2006-06-20 19:17

Message:
Logged In: YES 
user_id=698599

I think your problem is with str.splitlines(), rather than 
the csv.reader: splitlines ate the newline. If you pass it 
True as an argument, it will retain the end-of-line 
character in the resulting strings.

----------------------------------------------------------------------

Comment By: David Goodger (goodger)
Date: 2006-05-02 17:04

Message:
Logged In: YES 
user_id=7733

Assigned to Andrew McNamara, since his change appears to
have caused this regression (revision 38290 on
Modules/_csv.c).

----------------------------------------------------------------------

Comment By: David Goodger (goodger)
Date: 2006-05-02 16:58

Message:
Logged In: YES 
user_id=7733

Further investigation has revealed that the regression only
affects iterator I/O, not file I/O.  The attached
csv_test.py demonstrates.  Run with Python 2.5 to get:

results from file I/O:
[['one', '2', 'three (line 1)\n(line 2)']]

results from iterator I/O:
[['one', '2', 'three (line 1)(line 2)']]

----------------------------------------------------------------------

Comment By: David Goodger (goodger)
Date: 2006-04-05 11:44

Message:
Logged In: YES 
user_id=7733

This bug seems to be a side effect of revision 38290 on
Modules/_csv.c, which was prompted by bug 967934
(http://www.python.org/sf/967934).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1465014&group_id=5470