[issue21297] skipinitialspace in the csv module only skips spaces, not "whitespace" in general
New submission from Daniel Andersson: Regarding the `skipinitialspace` parameter to the different CSV reader dialects in the `csv` module, the official documentation asserts: When True, whitespace immediately following the delimiter is ignored. and the `help(csv)` style module documentation says: * skipinitialspace - specifies how to interpret whitespace which immediately follows a delimiter. It defaults to False, which means that whitespace immediately following a delimiter is part of the following field. "Whitespace" is a bit too general in both cases (at least a red herring in the second case), since it only skips spaces and not e.g. tabs [1]. In `Modules/_csv.c`, it more correctly describes the parameter. At line 81: int skipinitialspace; /* ignore spaces following delimiter? */ and the actual implementation at line 638: else if (c == ' ' && dialect->skipinitialspace) /* ignore space at start of field */ ; No-one will probably assume that the whole UTF-8 spectrum of "whitespace" is skipped, but at least I initially assumed that the tab character was included. [1]: http://en.wikipedia.org/wiki/Whitespace_character ---------- assignee: docs@python components: Documentation, Library (Lib) messages: 216780 nosy: Daniel.Andersson, docs@python priority: normal severity: normal status: open title: skipinitialspace in the csv module only skips spaces, not "whitespace" in general type: behavior versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 3.5 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue21297> _______________________________________
Terry J. Reedy added the comment: Do I understand correctly that only one space is ignored? ---------- nosy: +terry.reedy stage: -> needs patch title: skipinitialspace in the csv module only skips spaces, not "whitespace" in general -> csv.skipinitialspace only skips spaces, not "whitespace" in general versions: -Python 3.1, Python 3.2, Python 3.3 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue21297> _______________________________________
Daniel Andersson added the comment: No, multiple spaces are ignored as advertised (according to actual tests; not just reading the code), but only spaces (U+0020) and not e.g. tabs (U+0009), which are also included in the term "whitespace", along with several other characters. In light of your followup question, the internal comment at `Modules/_csv.c`, line 639: /* ignore space at start of field */ could perhaps be clarified to say "spaces" instead of "space", but the code context makes it quite clear, and it does not face the users anyway. The main point of this issue is meant to be the wording in the module docstring and the official docs regarding "whitespace" contra "space". ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue21297> _______________________________________
Changes by Andy Almonte <andy.almonte@gmail.com>: ---------- nosy: +Andy.Almonte _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue21297> _______________________________________
Brandon Milam added the comment: This code shows what Daniel Andersson was talking about. I changed the "whitespace" references in the documentation that Daniel mentioned to say spaces. Also I changed "ignore space at the start of the field" to "ignore spaces at the start of the field" due to Terry's confusion. Let me know of any errors or extra changes that are needed. ---------- nosy: +jbmilam Added file: http://bugs.python.org/file39558/csv_skipinitialspace_testing.py _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue21297> _______________________________________
Changes by Brandon Milam <jmilam343@gmail.com>: Added file: http://bugs.python.org/file39559/csv_skipinitialspace_testing.csv _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue21297> _______________________________________
Changes by Brandon Milam <jmilam343@gmail.com>: ---------- keywords: +patch Added file: http://bugs.python.org/file39560/csv_skipinitialspace_docfix.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue21297> _______________________________________
Berker Peksag added the comment: The patch looks good to me, thanks! Could you also convert your test script to a test case and add it in Lib/test/test_csv.py? ---------- nosy: +berker.peksag stage: needs patch -> patch review versions: +Python 3.6 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue21297> _______________________________________
Brandon Milam added the comment: This is my first attempt at working with the test suite but I believe this is what you were asking for. Due to this being my first attempt at writing tests I have included it as a separate patch file. Any further changes just let me know. ---------- Added file: http://bugs.python.org/file39732/skipinitialspace_test.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue21297> _______________________________________
Change by Irit Katriel <iritkatriel@yahoo.com>: ---------- versions: +Python 3.10, Python 3.8, Python 3.9 -Python 2.7, Python 3.4, Python 3.5, Python 3.6 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue21297> _______________________________________
participants (6)
-
Andy Almonte
-
Berker Peksag
-
Brandon Milam
-
Daniel Andersson
-
Irit Katriel
-
Terry J. Reedy