[scikit-learn] Replacing the Boston Housing Prices dataset

Andrew Holmes andrewholmes82 at icloud.com
Thu Jul 6 12:19:49 EDT 2017


But how do social scientists do research into racism without including ethnicity as a feature in the data?

Best wishes
Andrew

Public Profile


> On 6 Jul 2017, at 17:05, G Reina <greina at eng.ucsd.edu> wrote:
> 
> I'd like to request that the "Boston Housing Prices" dataset in sklearn (sklearn.datasets.load_boston) be replaced with the "Ames Housing Prices" dataset (https://ww2.amstat.org/publications/jse/v19n3/decock.pdf <https://ww2.amstat.org/publications/jse/v19n3/decock.pdf>). I am willing to submit the code change if the developers agree.
> 
> The Boston dataset has the feature "Bk is the proportion of blacks in town". It is an incredibly racist "feature" to include in any dataset. I think is beneath us as data scientists.
> 
> I submit that the Ames dataset is a viable alternative for learning regression. The author has shown that the dataset is a more robust replacement for Boston. Ames is a 2011 regression dataset on housing prices and has more than 5 times the amount of training examples with over 7 times as many features (none of which are morally questionable). 
> 
> I welcome the community's thoughts on the matter.
> 
> Thanks.
> -Tony
> 
> Here's an article I wrote on the Boston dataset:
> https://www.linkedin.com/pulse/hidden-racism-data-science-g-anthony-reina?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3Bmu67f2GSzj5xHMpSD6M00A%3D%3D <https://www.linkedin.com/pulse/hidden-racism-data-science-g-anthony-reina?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3Bmu67f2GSzj5xHMpSD6M00A%3D%3D>
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170706/459958d3/attachment-0001.html>


More information about the scikit-learn mailing list