[Python-Dev] [Python-checkins] cpython: #15927: Fix cvs.reader parsing of escaped \r\n with quoting off.
Kristján Valur Jónsson
kristjan at ccpgames.com
Wed Mar 20 04:16:53 CET 2013
The compiler complains about this line:
if (c == '\n' | c=='\r') {
Perhaps you wanted a Boolean operator?
-----Original Message-----
From: Python-checkins [mailto:python-checkins-bounces+kristjan=ccpgames.com at python.org] On Behalf Of r.david.murray
Sent: 19. mars 2013 19:42
To: python-checkins at python.org
Subject: [Python-checkins] cpython: #15927: Fix cvs.reader parsing of escaped \r\n with quoting off.
http://hg.python.org/cpython/rev/940748853712
changeset: 82815:940748853712
parent: 82811:684b75600fa9
user: R David Murray <rdmurray at bitdance.com>
date: Tue Mar 19 22:41:47 2013 -0400
summary:
#15927: Fix cvs.reader parsing of escaped \r\n with quoting off.
This fix means that such values are correctly roundtripped, since cvs.writer already does the correct escaping.
Patch by Michael Johnson.
files:
Lib/test/test_csv.py | 9 +++++++++
Misc/ACKS | 1 +
Misc/NEWS | 3 +++
Modules/_csv.c | 13 ++++++++++++-
4 files changed, 25 insertions(+), 1 deletions(-)
diff --git a/Lib/test/test_csv.py b/Lib/test/test_csv.py
--- a/Lib/test/test_csv.py
+++ b/Lib/test/test_csv.py
@@ -308,6 +308,15 @@
for i, row in enumerate(csv.reader(fileobj)):
self.assertEqual(row, rows[i])
+ def test_roundtrip_escaped_unquoted_newlines(self):
+ with TemporaryFile("w+", newline='') as fileobj:
+ writer = csv.writer(fileobj,quoting=csv.QUOTE_NONE,escapechar="\\")
+ rows = [['a\nb','b'],['c','x\r\nd']]
+ writer.writerows(rows)
+ fileobj.seek(0)
+ for i, row in enumerate(csv.reader(fileobj,quoting=csv.QUOTE_NONE,escapechar="\\")):
+ self.assertEqual(row,rows[i])
+
class TestDialectRegistry(unittest.TestCase):
def test_registry_badargs(self):
self.assertRaises(TypeError, csv.list_dialects, None) diff --git a/Misc/ACKS b/Misc/ACKS
--- a/Misc/ACKS
+++ b/Misc/ACKS
@@ -591,6 +591,7 @@
Fredrik Johansson
Gregory K. Johnson
Kent Johnson
+Michael Johnson
Simon Johnston
Matt Joiner
Thomas Jollans
diff --git a/Misc/NEWS b/Misc/NEWS
--- a/Misc/NEWS
+++ b/Misc/NEWS
@@ -289,6 +289,9 @@
Library
-------
+- Issue #15927: CVS now correctly parses escaped newlines and carriage
+ when parsing with quoting turned off.
+
- Issue #17467: add readline and readlines support to mock_open in
unittest.mock.
diff --git a/Modules/_csv.c b/Modules/_csv.c
--- a/Modules/_csv.c
+++ b/Modules/_csv.c
@@ -51,7 +51,7 @@
typedef enum {
START_RECORD, START_FIELD, ESCAPED_CHAR, IN_FIELD,
IN_QUOTED_FIELD, ESCAPE_IN_QUOTED_FIELD, QUOTE_IN_QUOTED_FIELD,
- EAT_CRNL
+ EAT_CRNL,AFTER_ESCAPED_CRNL
} ParserState;
typedef enum {
@@ -644,6 +644,12 @@
break;
case ESCAPED_CHAR:
+ if (c == '\n' | c=='\r') {
+ if (parse_add_char(self, c) < 0)
+ return -1;
+ self->state = AFTER_ESCAPED_CRNL;
+ break;
+ }
if (c == '\0')
c = '\n';
if (parse_add_char(self, c) < 0) @@ -651,6 +657,11 @@
self->state = IN_FIELD;
break;
+ case AFTER_ESCAPED_CRNL:
+ if (c == '\0')
+ break;
+ /*fallthru*/
+
case IN_FIELD:
/* in unquoted field */
if (c == '\n' || c == '\r' || c == '\0') {
--
Repository URL: http://hg.python.org/cpython
More information about the Python-Dev
mailing list