<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi Tony.<br>

    <br>

    I don't think it's a good idea to remove the dataset, given how many

    tutorials and examples rely on it.<br>

    I also don't think it's a good idea to ignore racial discrimination,

    which I guess this feature is trying to capture.<br>

    <br>

    I was recently asked to remove an excerpt from a dataset from my

    slide, as it was "too racist". It was randomly sampled<br>

    data from the adult census dataset. Unfortunately, economics in the

    US are not color blind (yet), and the reality is racist.<br>

    I haven't done an in-depth analysis on whether this feature is

    actually informative, but I don't think your analysis is conclusive.<br>

    <br>

    Including ethnicity in data actually allows us to ensure "fairness"

    in certain decision making processes.<br>

    Without collecting this data, it would be impossible to ensure

    automatic decisions are not influenced<br>

    by past human biases. Arguably that's not what the authors of this

    dataset are doing.<br>

    <br>

    Check out <a class="moz-txt-link-freetext" href="http://www.fatml.org/">http://www.fatml.org/</a> for more on fairness in machine

    learning and data science.<br>

    <br>

    Cheers,<br>

    Andy<br>

    <br>

    <br>

    <div class="moz-cite-prefix">On 07/06/2017 12:05 PM, G Reina wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAEBTegQhq3x1Jgm5zBY+MQ1kjB6ryfT0X8r5-esk6NP_5WY9rQ@mail.gmail.com">

      <div dir="ltr">

        <div>

          <div>

            <div>

              <div>

                <div>I'd like to request that the "Boston Housing

                  Prices" dataset in sklearn

                  (sklearn.datasets.load_boston) be replaced with the

                  "Ames Housing Prices" dataset (<a

                    href="https://ww2.amstat.org/publications/jse/v19n3/decock.pdf"

                    moz-do-not-send="true">https://ww2.amstat.org/publications/jse/v19n3/decock.pdf</a>).

                  I am willing to submit the code change if the

                  developers agree.<br>

                  <br>

                </div>

                The Boston dataset has the feature "Bk is the proportion

                of blacks in town". It is an incredibly racist "feature"

                to include in any dataset. I think is beneath us as data

                scientists.<br>

                <br>

              </div>

              I submit that the Ames dataset is a viable alternative for

              learning regression. The author has shown that the dataset

              is a more robust replacement for Boston. Ames is a 2011

              regression dataset on housing prices and has more than 5

              times the amount of training examples with over 7 times as

              many features (none of which are morally questionable). <br>

              <br>

            </div>

            I welcome the community's thoughts on the matter.<br>

            <br>

          </div>

          Thanks.<br>

        </div>

        -Tony<br>

        <br>

        Here's an article I wrote on the Boston dataset:<br>

        <a

href="https://www.linkedin.com/pulse/hidden-racism-data-science-g-anthony-reina?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3Bmu67f2GSzj5xHMpSD6M00A%3D%3D"

          moz-do-not-send="true">https://www.linkedin.com/pulse/hidden-racism-data-science-g-anthony-reina?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3Bmu67f2GSzj5xHMpSD6M00A%3D%3D</a><br>

        <br>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

scikit-learn mailing list

<a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>

<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>