<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>I work in the financial services industry and build machine
learning models for marketing applications. We put an enormous
effort (multiple layers of oversight and governance) into ensuring
that our models are free of bias against protected classes etc.
Having data describing race and ethnicity (among others) is
extremely important to validate this is indeed the case. Without
it, you have no such assurance. <br>
</p>
<br>
<div class="moz-cite-prefix">On 07/06/2017 12:19 PM, Andrew Holmes
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:E0CCC6AC-09BB-4F4B-A3EE-2918F91D0850@icloud.com">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
But how do social scientists do research into racism without
including ethnicity as a feature in the data?
<div class=""><br class="">
<div class="">
<div style="orphans: auto; text-align: start; text-indent:
0px; widows: auto; word-wrap: break-word; -webkit-nbsp-mode:
space; -webkit-line-break: after-white-space;" class="">
<div style="orphans: auto; text-align: start; text-indent:
0px; widows: auto; word-wrap: break-word;
-webkit-nbsp-mode: space; -webkit-line-break:
after-white-space;" class="">
<div style="orphans: auto; text-align: start; text-indent:
0px; widows: auto; word-wrap: break-word;
-webkit-nbsp-mode: space; -webkit-line-break:
after-white-space;" class="">
<div style="orphans: auto; text-align: start;
text-indent: 0px; widows: auto; word-wrap: break-word;
-webkit-nbsp-mode: space; -webkit-line-break:
after-white-space;" class="">
<div style="orphans: auto; text-align: start;
text-indent: 0px; widows: auto; word-wrap:
break-word; -webkit-nbsp-mode: space;
-webkit-line-break: after-white-space;" class="">
<div style="orphans: auto; text-align: start;
text-indent: 0px; widows: auto; word-wrap:
break-word; -webkit-nbsp-mode: space;
-webkit-line-break: after-white-space;" class="">
<div style="orphans: auto; text-align: start;
text-indent: 0px; widows: auto; word-wrap:
break-word; -webkit-nbsp-mode: space;
-webkit-line-break: after-white-space;" class="">
<div style="orphans: auto; text-align: start;
text-indent: 0px; widows: auto; word-wrap:
break-word; -webkit-nbsp-mode: space;
-webkit-line-break: after-white-space;"
class="">
<div style="orphans: auto; text-align: start;
text-indent: 0px; widows: auto; word-wrap:
break-word; -webkit-nbsp-mode: space;
-webkit-line-break: after-white-space;"
class="">
<div style="color: rgb(0, 0, 0);
letter-spacing: normal; text-transform:
none; white-space: normal; word-spacing:
0px; -webkit-text-stroke-width: 0px;"
class="">Best wishes<br class="">
<div class=""><font class=""
color="#000000">Andrew</font></div>
<div class=""><br class="">
</div>
</div>
<div style="color: rgb(0, 0, 0);
letter-spacing: normal; text-transform:
none; white-space: normal; word-spacing:
0px; -webkit-text-stroke-width: 0px;"
class=""><dt style="color: rgb(51, 51,
51); letter-spacing: normal;
text-transform: none; white-space:
normal; word-spacing: 0px;
-webkit-text-stroke-width: 0px; margin:
0px; padding: 0px; border: 0px;
font-size: 11px; font-family: Helvetica,
Arial, sans-serif; vertical-align:
baseline; font-variant-ligatures:
normal; font-variant-position: normal;
font-variant-numeric: normal;
font-variant-alternates: normal;
font-variant-east-asian: normal;
font-stretch: inherit; line-height:
17px; height: 1px; width: 1px; overflow:
hidden; clip: rect(1px, 1px, 1px, 1px);
widows: 1; background-color: rgb(246,
246, 246); position: absolute
!important;" class="">Public Profile</dt>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br class="Apple-interchange-newline">
</div>
<br class="">
<div>
<blockquote type="cite" class="">
<div class="">On 6 Jul 2017, at 17:05, G Reina <<a
href="mailto:greina@eng.ucsd.edu" class=""
moz-do-not-send="true">greina@eng.ucsd.edu</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="ltr" class="">
<div class="">
<div class="">
<div class="">
<div class="">
<div class="">I'd like to request that the
"Boston Housing Prices" dataset in sklearn
(sklearn.datasets.load_boston) be replaced
with the "Ames Housing Prices" dataset (<a
href="https://ww2.amstat.org/publications/jse/v19n3/decock.pdf"
class="" moz-do-not-send="true">https://ww2.amstat.org/publications/jse/v19n3/decock.pdf</a>).
I am willing to submit the code change if the
developers agree.<br class="">
<br class="">
</div>
The Boston dataset has the feature "Bk is the
proportion
of blacks in town". It is an incredibly racist
"feature" to include in any dataset. I think is
beneath us as data scientists.<br class="">
<br class="">
</div>
I submit that the Ames dataset is a viable
alternative for learning regression. The author
has shown that the dataset is a more robust
replacement for Boston. Ames is a 2011 regression
dataset on housing prices and has more than 5
times the amount of training examples with over 7
times as many features (none of which are morally
questionable). <br class="">
<br class="">
</div>
I welcome the community's thoughts on the matter.<br
class="">
<br class="">
</div>
Thanks.<br class="">
</div>
-Tony<br class="">
<br class="">
Here's an article I wrote on the Boston dataset:<br
class="">
<a
href="https://www.linkedin.com/pulse/hidden-racism-data-science-g-anthony-reina?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3Bmu67f2GSzj5xHMpSD6M00A%3D%3D"
class="" moz-do-not-send="true">https://www.linkedin.com/pulse/hidden-racism-data-science-g-anthony-reina?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3Bmu67f2GSzj5xHMpSD6M00A%3D%3D</a><br
class="">
<br class="">
</div>
_______________________________________________<br
class="">
scikit-learn mailing list<br class="">
<a href="mailto:scikit-learn@python.org" class=""
moz-do-not-send="true">scikit-learn@python.org</a><br
class="">
<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a><br
class="">
</div>
</blockquote>
</div>
<br class="">
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
scikit-learn mailing list
<a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>
<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>
</pre>
</blockquote>
<br>
</body>
</html>