pandas read_csv
Peter Otten
__peter__ at web.de
Fri Nov 9 12:15:37 EST 2018
Sharan Basappa wrote:
> are there any requirements about the format of the CSV file when using
> read_csv from pandas? For example, is it necessary that the csv file has
> to have same number of columns in every line etc.
> ParserError: Error tokenizing data. C error: Expected 1 fields in line 8,
saw 3
The error message is quite clear, look for extra fields in line 8 of your
data ;)
Now let's make a few experiments:
>>> import pandas, io
>>> def dump(s):
... return pandas.read_csv(io.StringIO(s))
...
>>> dump("""foo,bar
... 1,2
... """
... )
foo bar
0 1 2
[1 rows x 2 columns]
>>> dump("""foo,bar
... 1,2,3
... 4,5
... """)
foo bar
1 2 3
4 5 NaN
[2 rows x 2 columns]
>>> dump("""foo,bar
... 1,2
... 3,4,5
... """)
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
File "<stdin>", line 2, in dump
File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 420, in
parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 225, in
_read
return parser.read()
File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 626, in
read
ret = self._engine.read(nrows)
File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1070, in
read
data = self._reader.read(nrows)
File "parser.pyx", line 727, in pandas.parser.TextReader.read
(pandas/parser.c:6937)
File "parser.pyx", line 749, in pandas.parser.TextReader._read_low_memory
(pandas/parser.c:7156)
File "parser.pyx", line 802, in pandas.parser.TextReader._read_rows
(pandas/parser.c:7757)
File "parser.pyx", line 789, in pandas.parser.TextReader._tokenize_rows
(pandas/parser.c:7640)
File "parser.pyx", line 1697, in pandas.parser.raise_parser_error
(pandas/parser.c:19092)
pandas.parser.CParserError: Error tokenizing data. C error: Expected 2
fields in line 3, saw 3
>From this I infer that no row in the csv file may contain more columns than
the first data row. Missing columns are added automatically.
There is also an option to suppress rows containing too many columns:
>>> pandas.read_csv(io.StringIO("foo,bar\n1,2\n3,4,5\n6,7"),
error_bad_lines=False)
b'Skipping line 3: expected 2 fields, saw 3\n'
foo bar
0 1 2
1 6 7
[2 rows x 2 columns]
More information about the Python-list
mailing list