[ python-Bugs-967934 ] csv module cannot handle embedded \r

SourceForge.net noreply at sourceforge.net
Wed Apr 5 17:35:26 CEST 2006


Bugs item #967934, was opened at 2004-06-07 00:46
Message generated for change (Comment added) made by goodger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=967934&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Gregory Bond (gnbond)
Assigned to: Andrew McNamara (andrewmcnamara)
Summary: csv module cannot handle embedded \r

Initial Comment:
CSV module cannot handle the case of embedded \r (i.e.
carriage return) in a field.

As far as I can see, this is hard-coded into the _csv.c
file and cannot be fixed with Dialect changes.

----------------------------------------------------------------------

>Comment By: David Goodger (goodger)
Date: 2006-04-05 11:35

Message:
Logged In: YES 
user_id=7733

I just filed a bug (http://www.python.org/sf/1465014) that
seems to be related to this. Revision 38290 on
Modules/_csv.c includes the addition of this code:

    else if (c == '\n' || c == '\r') {
  	self->state = EAT_CRNL;
  	break;
    }

(and similar). This seems to be eating (deleting) control
chars, but newlines used to be significant. 

Embedded line breaks are allowed, according to RFC 4180
(http://www.ietf.org/rfc/rfc4180.txt). And according to the
Wikipedia entry
(http://en.wikipedia.org/wiki/Comma-separated_values), "a
line break within an element must be preserved."

----------------------------------------------------------------------

Comment By: Andrew McNamara (andrewmcnamara)
Date: 2005-01-13 06:34

Message:
Logged In: YES 
user_id=698599

If you're interested, I've just checked in a change to the CVS head for 
Python 2.5 that may, at least partially, fix this problem (if you try it, let me 
know how it goes).

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2004-06-07 07:25

Message:
Logged In: YES 
user_id=44345

It certainly intersects with it somehow.  ;-)  If nothing else, it
will serve as a useful test case.


----------------------------------------------------------------------

Comment By: Andrew McNamara (andrewmcnamara)
Date: 2004-06-07 01:32

Message:
Logged In: YES 
user_id=698599

I suspect this restriction (CR appearing within a quoted 
field) is a historical accident and can be safely removed. 

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-06-07 01:02

Message:
Logged In: YES 
user_id=80475

Skip, does this coincide with your planned switchover to
universal newlines?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=967934&group_id=5470


More information about the Python-bugs-list mailing list