[ python-Bugs-1465014 ] CSV regression in 2.5a1: multi-line cells

Thu Jun 22 20:17:35 CEST 2006

Bugs item #1465014, was opened at 2006-04-05 11:14
Message generated for change (Comment added) made by goodger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1465014&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 9
Submitted By: David Goodger (goodger)
Assigned to: Andrew McNamara (andrewmcnamara)
Summary: CSV regression in 2.5a1: multi-line cells

Initial Comment:
Running the attached csv_test.py under Python 2.4.2
(Windows XP SP1) produces:

>c:\apps\python24\python.exe ./csv_test.py
['one', '2', 'three (line 1)\n(line 2)']

Note that the third item in the row contains a newline
between "(line 1)" and "(line 2)".

With Python 2.5a1, I get:

>c:\apps\python25\python.exe ./csv_test.py
['one', '2', 'three (line 1)(line 2)']

Notice the missing newline, which is significant.  The
CSV module under 2.5a1 seems to lose data.

----------------------------------------------------------------------

>Comment By: David Goodger (goodger)
Date: 2006-06-22 14:17

Message:
Logged In: YES 
user_id=7733

I see what you're saying, but I disagree.  In Python 2.4,
csv.reader did not require newlines, but in Python 2.5 it
does.  That's a significant behavioral change.  In the
stdlib csv "Module Contents" docs for csv.reader, it says:
"csvfile can be any object which supports the iterator
protocol and returns a string each time its next method is
called."  It doesn't mention newline-terminated strings.

In any case, the behavior is inconsistent: newlines are not
required to terminate row-ending strings, but only strings
which end inside cells split across rows.  Why the discrepancy?

----------------------------------------------------------------------

Comment By: Andrew McNamara (andrewmcnamara)
Date: 2006-06-20 19:17

Message:
Logged In: YES 
user_id=698599

I think your problem is with str.splitlines(), rather than 
the csv.reader: splitlines ate the newline. If you pass it 
True as an argument, it will retain the end-of-line 
character in the resulting strings.

----------------------------------------------------------------------

Comment By: David Goodger (goodger)
Date: 2006-05-02 17:04

Message:
Logged In: YES 
user_id=7733

Assigned to Andrew McNamara, since his change appears to
have caused this regression (revision 38290 on
Modules/_csv.c).

----------------------------------------------------------------------

Comment By: David Goodger (goodger)
Date: 2006-05-02 16:58

Message:
Logged In: YES 
user_id=7733

Further investigation has revealed that the regression only
affects iterator I/O, not file I/O.  The attached
csv_test.py demonstrates.  Run with Python 2.5 to get:

results from file I/O:
[['one', '2', 'three (line 1)\n(line 2)']]

results from iterator I/O:
[['one', '2', 'three (line 1)(line 2)']]

----------------------------------------------------------------------

Comment By: David Goodger (goodger)
Date: 2006-04-05 11:44

Message:
Logged In: YES 
user_id=7733

This bug seems to be a side effect of revision 38290 on
Modules/_csv.c, which was prompted by bug 967934
(http://www.python.org/sf/967934).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1465014&group_id=5470