[scikit-learn] a dataset suitable for logistic regression
Francois Dion
francois.dion at gmail.com
Mon Dec 4 08:02:09 EST 2017
There's at least one that is part of base.py in sklearn.datasets.
from sklearn.datasets import load_breast_cancer
load_breast_cancer?
Signature: load_breast_cancer(return_X_y=False)
Docstring:
Load and return the breast cancer wisconsin dataset (classification).
The breast cancer dataset is a classic and very easy binary classification
dataset.
================= ==============
Classes 2
Samples per class 212(M),357(B)
Samples total 569
Dimensionality 30
Features real, positive
================= ==============
It is a very small data set. If you need something much larger, you can
easily create large (artificial) sets using make_classification. And you
can augment that with faker or elizabeth (pypi modules, not part of
scikit-learn) to create realistic looking data sets.
Francois
about.me/francois.dion
On Sun, Dec 3, 2017 at 3:54 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> Hi, iris is a three-class dataset. Is there a dataset in sklearn that
> is suitable for binary classification? Thanks.
>
> --
> Regards,
> Peng
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20171204/878bad54/attachment.html>
More information about the scikit-learn
mailing list