[scikit-learn] Creating dataset

Mahmood Naderan mahmood.nt at gmail.com
Sun Nov 8 07:42:38 EST 2020


Thanks for the replies.

>I'd recommend just reading that csv file with e.g. pandas
>(
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html),

>and then just use the dataframe as input to scikit-learn utilities (you
may need to
>separate the features X from the target y).


I am trying to follow the steps as described in
https://towardsdatascience.com/a-step-by-step-introduction-to-pca-c0d78e26a0dd

I changed

iris = load_iris()
colors = ["blue","red","green"]
df = DataFrame(
    data=np.c_[iris["data"],  iris["target"]], columns=
iris["feature_names"] + ["target"])

to

data_file = pd.read_csv("mydata.csv")
colors =
["blue","red","green","skyblue","indigo","plum","coral","orange","gray","lime"]
df = DataFrame(
    data=np.c_[data_file["data"], data_file["target"]],
columns=data_file["feature_names"] + ["target"])


But I get this error:

Traceback (most recent call last):
  File
"/home/mahmood/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py",
line 2895, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in
pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in
pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in
pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in
pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'data'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "pca_gromacs.py", line 12, in <module>
    data=np.c_[data_file["data"], data_file["target"]],
columns=data_file["feature_names"] + ["target"]
  File
"/home/mahmood/.local/lib/python3.6/site-packages/pandas/core/frame.py",
line 2906, in __getitem__
    indexer = self.columns.get_loc(key)
  File
"/home/mahmood/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py",
line 2897, in get_loc
    raise KeyError(key) from err
KeyError: 'data'



It seems that load_iris() do more than read_csv().

Regards,
Mahmood
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20201108/b9174a1c/attachment.html>


More information about the scikit-learn mailing list