[issue21297] skipinitialspace in the csv module only skips spaces, not "whitespace" in general
New submission from Daniel Andersson:
Regarding the `skipinitialspace` parameter to the different CSV reader dialects in the `csv` module, the official documentation asserts:
When True, whitespace immediately following the delimiter is ignored.
and the `help(csv)` style module documentation says:
* skipinitialspace - specifies how to interpret whitespace which
immediately follows a delimiter. It defaults to False, which
means that whitespace immediately following a delimiter is part
of the following field.
"Whitespace" is a bit too general in both cases (at least a red herring in the second case), since it only skips spaces and not e.g. tabs [1].
In `Modules/_csv.c`, it more correctly describes the parameter. At line 81:
int skipinitialspace; /* ignore spaces following delimiter? */
and the actual implementation at line 638:
else if (c == ' ' && dialect->skipinitialspace)
/* ignore space at start of field */
;
No-one will probably assume that the whole UTF-8 spectrum of "whitespace" is skipped, but at least I initially assumed that the tab character was included.
[1]: http://en.wikipedia.org/wiki/Whitespace_character
----------
assignee: docs@python
components: Documentation, Library (Lib)
messages: 216780
nosy: Daniel.Andersson, docs@python
priority: normal
severity: normal
status: open
title: skipinitialspace in the csv module only skips spaces, not "whitespace" in general
type: behavior
versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 3.5
_______________________________________
Python tracker
Terry J. Reedy added the comment:
Do I understand correctly that only one space is ignored?
----------
nosy: +terry.reedy
stage: -> needs patch
title: skipinitialspace in the csv module only skips spaces, not "whitespace" in general -> csv.skipinitialspace only skips spaces, not "whitespace" in general
versions: -Python 3.1, Python 3.2, Python 3.3
_______________________________________
Python tracker
Daniel Andersson added the comment:
No, multiple spaces are ignored as advertised (according to actual tests; not just reading the code), but only spaces (U+0020) and not e.g. tabs (U+0009), which are also included in the term "whitespace", along with several other characters.
In light of your followup question, the internal comment at `Modules/_csv.c`, line 639:
/* ignore space at start of field */
could perhaps be clarified to say "spaces" instead of "space", but the code context makes it quite clear, and it does not face the users anyway. The main point of this issue is meant to be the wording in the module docstring and the official docs regarding "whitespace" contra "space".
----------
_______________________________________
Python tracker
Changes by Andy Almonte
Brandon Milam added the comment:
This code shows what Daniel Andersson was talking about. I changed the "whitespace" references in the documentation that Daniel mentioned to say spaces. Also I changed "ignore space at the start of the field" to "ignore spaces at the start of the field" due to Terry's confusion.
Let me know of any errors or extra changes that are needed.
----------
nosy: +jbmilam
Added file: http://bugs.python.org/file39558/csv_skipinitialspace_testing.py
_______________________________________
Python tracker
Changes by Brandon Milam
Changes by Brandon Milam
Berker Peksag added the comment:
The patch looks good to me, thanks! Could you also convert your test script to a test case and add it in Lib/test/test_csv.py?
----------
nosy: +berker.peksag
stage: needs patch -> patch review
versions: +Python 3.6
_______________________________________
Python tracker
Brandon Milam added the comment:
This is my first attempt at working with the test suite but I believe this is what you were asking for. Due to this being my first attempt at writing tests I have included it as a separate patch file. Any further changes just let me know.
----------
Added file: http://bugs.python.org/file39732/skipinitialspace_test.patch
_______________________________________
Python tracker
Change by Irit Katriel
participants (6)
-
Andy Almonte
-
Berker Peksag
-
Brandon Milam
-
Daniel Andersson
-
Irit Katriel
-
Terry J. Reedy