[Numpy-discussion] Help loading data into pandas

Vincent Davis vincent at vincentdavis.net
Wed May 13 17:14:39 EDT 2015

​I have a large (~400mb) csv file I am trying to open in Pandas. When I
don't specify the dtype and open it with the following command It appears
to work.

df = pd.io.parsers.read_csv(CSVFILECLEAN2013, quotechar='"',
low_memory=False, na_values='')

If I try to specify the dtype for each field I get an error but no hint as
to where I should look. I have "cleaned" the csv by checking that all
values that should be an int for a float are either blank or can be cast as
a float or a int. I guess my question is, can I get a more useful error
message or is there a hint as to where the problem is that I am not seeing.

Exception                                 Traceback (most recent call last)
<ipython-input-2-8715d8cbaa54> in <module>()
      3 import load_data
      4 import numpy as np
----> 5 df2 = load_data.load('jeffco_2013')

/Users/vmd/GitHub/Jeffco-Properties/tools/load_data.py in load(data)
     47 def load(data):
     48     files = dict(jeffco_2013 =
---> 49     return pd.io.parsers.read_csv(files[data], quotechar='"',
low_memory=False, na_values='', dtype=DATASHAPE)

in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote,
escapechar, quotechar, quoting, skipinitialspace, lineterminator, header,
index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values,
na_fvalues, true_values, false_values, delimiter, converters, dtype,
usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints,
use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines,
keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col,
dayfirst, date_parser, memory_map, float_precision, nrows, iterator,
chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols,
infer_datetime_format, skip_blank_lines)
    468                     skip_blank_lines=skip_blank_lines)
--> 470         return _read(filepath_or_buffer, kwds)
    472     parser_f.__name__ = name

in _read(filepath_or_buffer, kwds)
    254         return parser
--> 256     return parser.read()
    258 _parser_defaults = {

in read(self, nrows)
    713                 raise ValueError('skip_footer not supported for
--> 715         ret = self._engine.read(nrows)
    717         if self.options.get('as_recarray'):

in read(self, nrows)
   1163         try:
-> 1164             data = self._reader.read(nrows)
   1165         except StopIteration:
   1166             if nrows is None:

pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:7426)()

pandas/parser.pyx in pandas.parser.TextReader._read_rows

pandas/parser.pyx in pandas.parser.TextReader._convert_column_data

pandas/parser.pyx in pandas.parser.TextReader._convert_tokens

pandas/parser.pyx in pandas.parser.TextReader._convert_with_dtype

Exception: Integer column has NA values

Vincent Davis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150513/e92d06b2/attachment.html>

More information about the NumPy-Discussion mailing list