[scikit-learn] Replacing the Boston Housing Prices dataset

jma jeffrey.m.allard at gmail.com
Thu Jul 6 13:38:02 EDT 2017


I work in the financial services industry and build machine learning 
models for marketing applications. We put an enormous effort (multiple 
layers of oversight and governance) into ensuring that our models are 
free of bias against protected classes etc. Having data describing race 
and ethnicity (among others) is extremely important to validate this is 
indeed the case.  Without it, you have no such assurance.


On 07/06/2017 12:19 PM, Andrew Holmes wrote:
> But how do social scientists do research into racism without including 
> ethnicity as a feature in the data?
>
> Best wishes
> Andrew
>
> Public Profile
>
>
>> On 6 Jul 2017, at 17:05, G Reina <greina at eng.ucsd.edu 
>> <mailto:greina at eng.ucsd.edu>> wrote:
>>
>> I'd like to request that the "Boston Housing Prices" dataset in 
>> sklearn (sklearn.datasets.load_boston) be replaced with the "Ames 
>> Housing Prices" dataset 
>> (https://ww2.amstat.org/publications/jse/v19n3/decock.pdf). I am 
>> willing to submit the code change if the developers agree.
>>
>> The Boston dataset has the feature "Bk is the proportion of blacks in 
>> town". It is an incredibly racist "feature" to include in any 
>> dataset. I think is beneath us as data scientists.
>>
>> I submit that the Ames dataset is a viable alternative for learning 
>> regression. The author has shown that the dataset is a more robust 
>> replacement for Boston. Ames is a 2011 regression dataset on housing 
>> prices and has more than 5 times the amount of training examples with 
>> over 7 times as many features (none of which are morally questionable).
>>
>> I welcome the community's thoughts on the matter.
>>
>> Thanks.
>> -Tony
>>
>> Here's an article I wrote on the Boston dataset:
>> https://www.linkedin.com/pulse/hidden-racism-data-science-g-anthony-reina?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3Bmu67f2GSzj5xHMpSD6M00A%3D%3D
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170706/50c8e199/attachment.html>


More information about the scikit-learn mailing list