<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi</p>
<p>As was recently mentioned in PR #18594, the problem with the
boston housing dataset does not go away, just because we remove it
from scikit-learn. On the contrary, it is a valuable dataset to
show and teach bias and discrimination - issue #16715 is still
waiting for someone to write an example - in particular because we
have access to the variable "B".<br>
</p>
<p>Most, if not all, of the datasets in scikit-learn are available
elsewhere, even in python. So I don't think this is a good
argument either for removal.<br>
</p>
<p>As we've now removed it from tests and examples, the question for
me is: What do we want to achieve furthermore?<br>
Answers I can think of go down a political road...</p>
<p>I'm fine with Olivier's suggestion <a moz-do-not-send="true"
href="https://github.com/scikit-learn/scikit-learn/pull/18594#issuecomment-707626543">https://github.com/scikit-learn/scikit-learn/pull/18594#issuecomment-707626543</a>.<br>
</p>
<p><br>
All the best,<br>
Christian<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 14.10.20 10:34, Adrin wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAEOrW48s=HXY2==LCfc5t0vAc-Rf7UnQYV9Jbo=8wJ_U+VBQ-g@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div class="gmail_default" style="font-size:small">Most of those
are not talking about the ethical issues of the dataset. Let's
talk about the alternatives we have:</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">Keep the
loader, but raise a warning:</div>
<div class="gmail_default" style="font-size:small">- this will
result in most people not changing their code/material, and
IMO mostly ignore the warning. Some</div>
<div class="gmail_default" style="font-size:small">people may
see the warning and care about it.</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">Deprecate,
and point them to an alternative dataset, and if they really
really want the same dataset, point them</div>
<div class="gmail_default" style="font-size:small">to the openml
ID:</div>
<div class="gmail_default" style="font-size:small">- People will
have to change something, and if we give them a nice
copy/paste-able alternative which is not boston,</div>
<div class="gmail_default" style="font-size:small">they'll use
that instead.</div>
<div class="gmail_default" style="font-size:small">- Some people
will keep using boston from openml, and not care about the
ethical implications</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">As an
addition, we can keep the load_boston in the docs only, and
point users to alternatives even after removing</div>
<div class="gmail_default" style="font-size:small">the loader.<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Oct 14, 2020 at 10:11
AM Olivier Grisel <<a
href="mailto:olivier.grisel@ensta.org"
moz-do-not-send="true">olivier.grisel@ensta.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Le
mar. 13 oct. 2020 à 16:19, Adrin <<a
href="mailto:adrin.jalali@gmail.com" target="_blank"
moz-do-not-send="true">adrin.jalali@gmail.com</a>> a
écrit :<br>
><br>
> Isn't the Boston dataset available through openml? Maybe
here: <a href="https://www.openml.org/d/531" rel="noreferrer"
target="_blank" moz-do-not-send="true">https://www.openml.org/d/531</a><br>
><br>
> I'm happy to have the dataset out there on opemml, and
for any material that addresses some of the issues with it.<br>
> But for educational purposes, we don't need to have the
dataset in the package as long as users can still download it<br>
> with a oneliner using fetch_openml.<br>
<br>
That would be an argument in favor of deprecation warning with
a<br>
message stating the motivation for deprecation and pointing to<br>
fetch_openml.<br>
<br>
However it's going to break examples written in slow to update<br>
tutorials or book once the deprecation period is over. But one
could<br>
argue that this is also the case for any other deprecation in<br>
scikit-learn. It's just that sklearn.datasets.load_boston is
used A<br>
LOT: <a
href="https://github.com/search?q=load_boston&type=code"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/search?q=load_boston&type=code</a><br>
<br>
-- <br>
Olivier<br>
_______________________________________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank"
moz-do-not-send="true">scikit-learn@python.org</a><br>
<a
href="https://mail.python.org/mailman/listinfo/scikit-learn"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
</blockquote>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
scikit-learn mailing list
<a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>
<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>
</pre>
</blockquote>
</body>
</html>