[ python-Bugs-1072404 ] Bugs in _csv module - lineterminator
SourceForge.net
noreply at sourceforge.net
Thu Jan 13 05:14:32 CET 2005
Bugs item #1072404, was opened at 2004-11-24 23:00
Message generated for change (Comment added) made by andrewmcnamara
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1072404&group_id=5470
Category: Python Library
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Chris Withers (fresh)
>Assigned to: Andrew McNamara (andrewmcnamara)
Summary: Bugs in _csv module - lineterminator
Initial Comment:
On trying to parse a '\r' terminated csv generated on a
Mac, I get a "newline inside string" error from the csv
module.
Two things sprung to mind having read:
http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Modules/_csv.c?rev=1.15&view=markup
...for a bit.
1. The Dialect's lineterminator doesn't appear to be
used when parsing a CSV. This feels like a bug to be,
'cos I could specify the terminator if
Reader_iternext(ReaderObj *self) used it :-S
2. The processing in Reader_iternext(ReaderObj *self)
assumes that a '\r' will be followed by '\0' for Macs,
'\n' for windows, and anything else is an error.
but:
>>> c = open('var\data\metadata.csv').read()
>>> c[:100]
'BENEFIT,,Subjects relating to all benefits,AB
\rBENEFIT,PARTNERDIED,Bereavement
Should I be expecting to see a '\0' there?
Anyway, the real bug seems to be the reader's ignorance
of the lineterminator. However, even if my analysis is
off the mark, the problem still exists :-S
cheers,
Chris
----------------------------------------------------------------------
>Comment By: Andrew McNamara (andrewmcnamara)
Date: 2005-01-13 15:14
Message:
Logged In: YES
user_id=698599
The reader expects to be supplied an iterator that returns lines - in this
case, the file iterator has not recognised \r as end-of-line and has read the
whole file in and yielded that as a "line". If you use universal-newline mode
on your source file, you should have more luck.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2004-11-25 15:23
Message:
Logged In: YES
user_id=44345
This is a known problem. See the April archives of the csv
mailing list:
http://manatee.mojam.com/pipermail/csv/2004-April/thread.html
Solutions are welcome. I suspect any solution will involve
either
discarding PyIter_Next altogether or further subdividing what it
returns.
A couple things to note in the way of workarounds:
1. Reader_iternext() defers to PyIter_Next() to grab the
next line,
so there's really no opportunity to interject the
lineterminator into
the operation with the current code. This means reading from
StringIO objects that use \r lineterminators will always fail.
2. If you have a real file as input and open it in universal
newline
mode you will get the correct behavior.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1072404&group_id=5470
More information about the Python-bugs-list
mailing list