[scikit-learn] Creating dataset

Mahmood Naderan mahmood.nt at gmail.com
Sun Nov 8 09:19:18 EST 2020


 >You need to understand what the different statements are doing; just
>as you need to understand what processing you apply on your data
>(whether it's preprocessing or learning) to properly use any machine
>learning tool.

I know, but the problem is that the csv file of the iris doesn't have such
information and as I said, I think there are some additional steps that I
don't know exactly what they are.

For example, if you look at
~/.local/lib/python3.6/site-packages/sklearn/datasets/data/iris.csv you
will see

150,4,setosa,versicolor,virginica
5.1,3.5,1.4,0.2,0
4.9,3.0,1.4,0.2,0
...

So, the first line means 150 instances (rows) with 4 columns and three iris
types.
However, when I use

iris = load_iris()
print(iris)

I see a lot of metadata, such as:

{'data': array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
...
       [5.9, 3. , 5.1, 1.8]]), 'target': array([0, 0,...]), 'frame': None,
'target_names': array(['setosa', 'versicolor', 'virginica'], dtype='<U10'),
'DESCR': '.. _iris_dataset:\n\nIris plants
dataset\n--------------------\n\n**Data Set Characteristics:**\n\n
 :Number of Instances: 150 (50 in each of three classes)\n    :Number of
Attributes: 4 numeric, predictive attributes and the class\n    :Attribute
Information:\n        - sepal length in cm\n        - sepal width in cm\n
     - petal length in cm\n        - petal width in cm\n        - class:\n
               - Iris-Setosa\n                - Iris-Versicolour\n
       - Iris-Virginica\n                \n


The question is how these metadata are created and stored in this package?
I mean, what does

from sklearn.datasets import load_iris

do with the csv file? If I know, then I am also able to create a similar
dataset.


Regards,
Mahmood
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20201108/3214b470/attachment.html>


More information about the scikit-learn mailing list