[scikit-learn] About the Boston housing prices dataset
adrin.jalali at gmail.com
Wed Oct 14 04:34:22 EDT 2020
Most of those are not talking about the ethical issues of the dataset.
Let's talk about the alternatives we have:
Keep the loader, but raise a warning:
- this will result in most people not changing their code/material, and IMO
mostly ignore the warning. Some
people may see the warning and care about it.
Deprecate, and point them to an alternative dataset, and if they really
really want the same dataset, point them
to the openml ID:
- People will have to change something, and if we give them a nice
copy/paste-able alternative which is not boston,
they'll use that instead.
- Some people will keep using boston from openml, and not care about the
As an addition, we can keep the load_boston in the docs only, and point
users to alternatives even after removing
On Wed, Oct 14, 2020 at 10:11 AM Olivier Grisel <olivier.grisel at ensta.org>
> Le mar. 13 oct. 2020 à 16:19, Adrin <adrin.jalali at gmail.com> a écrit :
> > Isn't the Boston dataset available through openml? Maybe here:
> > I'm happy to have the dataset out there on opemml, and for any material
> that addresses some of the issues with it.
> > But for educational purposes, we don't need to have the dataset in the
> package as long as users can still download it
> > with a oneliner using fetch_openml.
> That would be an argument in favor of deprecation warning with a
> message stating the motivation for deprecation and pointing to
> However it's going to break examples written in slow to update
> tutorials or book once the deprecation period is over. But one could
> argue that this is also the case for any other deprecation in
> scikit-learn. It's just that sklearn.datasets.load_boston is used A
> LOT: https://github.com/search?q=load_boston&type=code
> scikit-learn mailing list
> scikit-learn at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the scikit-learn