IndexError for using pandas dataframe values
Peter Otten
__peter__ at web.de
Sat May 28 03:25:38 EDT 2016
Peter Otten wrote:
> Daiyue Weng wrote:
>
>> Hi, I tried to use DataFrame.values to convert a list of columns in a
>> dataframe to a numpy ndarray/matrix,
>>
>> matrix = df.values[:, list_of_cols]
>>
>> but got an error,
>>
>> IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis
>> (None) and integer or boolean arrays are valid indices
>>
>> so what's the problem with the list of columns I passed in?
>>
>> many thanks
>
> Your suggestively named list_of_cols is probably not a list. Have your
> script print its value and type before the failing operation:
>
> print(type(list_of_cols), list_of_cols)
>> matrix = df.values[:, list_of_cols]
Am Do Mai 26 2016, 09:21:59 schrieb Daiyue Weng:
[If you had sent this to the list I would have seen it earlier.
Just in case you didn't solve the problem in the meantime:]
> it prints
>
> <class 'list'> ['key1', 'key2']
So my initial assumption was wrong -- list_of_cols is a list. However,
df.values is a numpy array and therefore expects integer indices:
>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns="key1 key2 key3".split())
>>> df
key1 key2 key3
0 1 2 3
1 4 5 6
[2 rows x 3 columns]
>>> df.values
array([[1, 2, 3],
[4, 5, 6]])
>>> df.values[["key1", "key2"]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'key1'
(I get a different error message, probably because we use different versions
of numpy)
To fix the problem you can either use integers
>>> df.values[:,[0, 1]]
array([[1, 2],
[4, 5]])
or select the columns in pandas:
>>> df[["key1", "key2"]].values
array([[1, 2],
[4, 5]])
More information about the Python-list
mailing list