[scikit-learn] a dataset suitable for logistic regression

Francois Dion francois.dion at gmail.com
Mon Dec 4 08:02:09 EST 2017


There's at least one that is part of base.py in sklearn.datasets.

from sklearn.datasets import load_breast_cancer

load_breast_cancer?

Signature: load_breast_cancer(return_X_y=False)
Docstring:
Load and return the breast cancer wisconsin dataset (classification).

The breast cancer dataset is a classic and very easy binary classification
dataset.

=================   ==============
Classes                          2
Samples per class    212(M),357(B)
Samples total                  569
Dimensionality                  30
Features            real, positive
=================   ==============

It is a very small data set. If you need something much larger, you can
easily create large (artificial) sets using make_classification. And you
can augment that with faker or elizabeth (pypi modules, not part of
scikit-learn) to create realistic looking data sets.

Francois



about.me/francois.dion


On Sun, Dec 3, 2017 at 3:54 PM, Peng Yu <pengyu.ut at gmail.com> wrote:

> Hi, iris is a three-class dataset. Is there a dataset in sklearn that
> is suitable for binary classification? Thanks.
>
> --
> Regards,
> Peng
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20171204/878bad54/attachment.html>


More information about the scikit-learn mailing list