<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">But how do social scientists do research into racism without including ethnicity as a feature in the data?<div class=""><br class=""><div class="">
<div style="orphans: auto; text-align: start; text-indent: 0px; widows: auto; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="orphans: auto; text-align: start; text-indent: 0px; widows: auto; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="orphans: auto; text-align: start; text-indent: 0px; widows: auto; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="orphans: auto; text-align: start; text-indent: 0px; widows: auto; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="orphans: auto; text-align: start; text-indent: 0px; widows: auto; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="orphans: auto; text-align: start; text-indent: 0px; widows: auto; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="orphans: auto; text-align: start; text-indent: 0px; widows: auto; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="orphans: auto; text-align: start; text-indent: 0px; widows: auto; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="orphans: auto; text-align: start; text-indent: 0px; widows: auto; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="color: rgb(0, 0, 0); letter-spacing: normal; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">Best wishes<br class=""><div class=""><font color="#000000" class="">Andrew</font></div><div class=""><br class=""></div></div><div class=""></div><div style="color: rgb(0, 0, 0); letter-spacing: normal; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><dt style="color: rgb(51, 51, 51); letter-spacing: normal; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; margin: 0px; padding: 0px; border: 0px; font-size: 11px; font-family: Helvetica, Arial, sans-serif; vertical-align: baseline; font-variant-ligatures: normal; font-variant-position: normal; font-variant-numeric: normal; font-variant-alternates: normal; font-variant-east-asian: normal; font-stretch: inherit; line-height: 17px; height: 1px; width: 1px; overflow: hidden; clip: rect(1px, 1px, 1px, 1px); widows: 1; background-color: rgb(246, 246, 246); position: absolute !important;" class="">Public Profile</dt></div></div></div></div></div></div></div></div></div></div><br class="Apple-interchange-newline">
</div>
<br class=""><div><blockquote type="cite" class=""><div class="">On 6 Jul 2017, at 17:05, G Reina <<a href="mailto:greina@eng.ucsd.edu" class="">greina@eng.ucsd.edu</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class=""><div class=""><div class=""><div class=""><div class="">I'd like to request that the "Boston Housing Prices" dataset in sklearn (sklearn.datasets.load_boston) be replaced with the "Ames Housing Prices" dataset (<a href="https://ww2.amstat.org/publications/jse/v19n3/decock.pdf" class="">https://ww2.amstat.org/publications/jse/v19n3/decock.pdf</a>). I am willing to submit the code change if the developers agree.<br class=""><br class=""></div>The Boston dataset has the feature "Bk is the proportion
of blacks in town". It is an incredibly racist "feature" to include in any dataset. I think is beneath us as data scientists.<br class=""><br class=""></div>I submit that the Ames dataset is a viable alternative for learning regression. The author has shown that the dataset is a more robust replacement for Boston. Ames is a 2011 regression dataset on housing prices and has more than 5 times the amount of training examples with over 7 times as many features (none of which are morally questionable). <br class=""><br class=""></div>I welcome the community's thoughts on the matter.<br class=""><br class=""></div>Thanks.<br class=""></div>-Tony<br class=""><br class="">Here's an article I wrote on the Boston dataset:<br class=""><a href="https://www.linkedin.com/pulse/hidden-racism-data-science-g-anthony-reina?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3Bmu67f2GSzj5xHMpSD6M00A%3D%3D" class="">https://www.linkedin.com/pulse/hidden-racism-data-science-g-anthony-reina?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3Bmu67f2GSzj5xHMpSD6M00A%3D%3D</a><br class=""><br class=""></div>
_______________________________________________<br class="">scikit-learn mailing list<br class=""><a href="mailto:scikit-learn@python.org" class="">scikit-learn@python.org</a><br class="">https://mail.python.org/mailman/listinfo/scikit-learn<br class=""></div></blockquote></div><br class=""></div></body></html>