[Numpy-discussion] Help loading data into pandas

Nathaniel Smith njs at pobox.com
Wed May 13 17:27:39 EDT 2015


I don't think pandas allows blank values in integer columns? You might get
better results asking on the pandas list, though -- see
  http://pandas.pydata.org/community.html

-n
On May 13, 2015 2:17 PM, "Vincent Davis" <vincent at vincentdavis.net> wrote:

> ​I have a large (~400mb) csv file I am trying to open in Pandas. When I
> don't specify the dtype and open it with the following command It appears
> to work.
>
> df = pd.io.parsers.read_csv(CSVFILECLEAN2013, quotechar='"',
> low_memory=False, na_values='')
>
> If I try to specify the dtype for each field I get an error but no hint as
> to where I should look. I have "cleaned" the csv by checking that all
> values that should be an int for a float are either blank or can be cast as
> a float or a int. I guess my question is, can I get a more useful error
> message or is there a hint as to where the problem is that I am not seeing.
>
> Exception                                 Traceback (most recent call last)
> <ipython-input-2-8715d8cbaa54> in <module>()
>       3 import load_data
>       4 import numpy as np
> ----> 5 df2 = load_data.load('jeffco_2013')
>
> /Users/vmd/GitHub/Jeffco-Properties/tools/load_data.py in load(data)
>      47 def load(data):
>      48     files = dict(jeffco_2013 =
> '/Users/vmd/GitHub/Jeffco-Properties/Data/JeffersonCo/Datasets/2013_clean_Jeffco_ATSDTA_ATSP600.csv')
> ---> 49     return pd.io.parsers.read_csv(files[data], quotechar='"',
> low_memory=False, na_values='', dtype=DATASHAPE)
>
> /Users/vmd/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py
> in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote,
> escapechar, quotechar, quoting, skipinitialspace, lineterminator, header,
> index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values,
> na_fvalues, true_values, false_values, delimiter, converters, dtype,
> usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints,
> use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines,
> keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col,
> dayfirst, date_parser, memory_map, float_precision, nrows, iterator,
> chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols,
> infer_datetime_format, skip_blank_lines)
>     468                     skip_blank_lines=skip_blank_lines)
>     469
> --> 470         return _read(filepath_or_buffer, kwds)
>     471
>     472     parser_f.__name__ = name
>
> /Users/vmd/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py
> in _read(filepath_or_buffer, kwds)
>     254         return parser
>     255
> --> 256     return parser.read()
>     257
>     258 _parser_defaults = {
>
> /Users/vmd/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py
> in read(self, nrows)
>     713                 raise ValueError('skip_footer not supported for
> iteration')
>     714
> --> 715         ret = self._engine.read(nrows)
>     716
>     717         if self.options.get('as_recarray'):
>
> /Users/vmd/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py
> in read(self, nrows)
>    1162
>    1163         try:
> -> 1164             data = self._reader.read(nrows)
>    1165         except StopIteration:
>    1166             if nrows is None:
>
> pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:7426)()
>
> pandas/parser.pyx in pandas.parser.TextReader._read_rows
> (pandas/parser.c:8484)()
>
> pandas/parser.pyx in pandas.parser.TextReader._convert_column_data
> (pandas/parser.c:9795)()
>
> pandas/parser.pyx in pandas.parser.TextReader._convert_tokens
> (pandas/parser.c:10403)()
>
> pandas/parser.pyx in pandas.parser.TextReader._convert_with_dtype
> (pandas/parser.c:11257)()
>
> Exception: Integer column has NA values
>
>
>
>
>
> Vincent Davis
> 720-301-3003
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150513/05ab7b3c/attachment.html>


More information about the NumPy-Discussion mailing list