[scikit-learn] Replacing the Boston Housing Prices dataset

Sean Violante sean.violante at gmail.com
Thu Jul 6 15:08:33 EDT 2017


G Reina
you make a bizarre argument. You argue that you should not even check
racism as a possible factor in house prices?

But then you yourself check whether its relevant
Then you say

"but I'd argue that it's more due to the location (near water, near
businesses, near restaurants, near parks and recreation) than to the ethnic
makeup"

Which  was basically what  the original authors wanted to show too,

Harrison, D. and Rubinfeld, D.L. `Hedonic prices and the demand for clean
air', J. Environ. Economics & Management, vol.5, 81-102, 1978.

 but unless you measure ethnic make-up you cannot show that it is not a
confounder.

The term "white flight" refers to affluent white families moving to the
suburbs.. And clearly a question is whether/how much was racism or avoiding
air pollution.





On 6 Jul 2017 6:10 pm, "G Reina" <greina at eng.ucsd.edu> wrote:

> I'd like to request that the "Boston Housing Prices" dataset in sklearn
> (sklearn.datasets.load_boston) be replaced with the "Ames Housing Prices"
> dataset (https://ww2.amstat.org/publications/jse/v19n3/decock.pdf). I am
> willing to submit the code change if the developers agree.
>
> The Boston dataset has the feature "Bk is the proportion of blacks in
> town". It is an incredibly racist "feature" to include in any dataset. I
> think is beneath us as data scientists.
>
> I submit that the Ames dataset is a viable alternative for learning
> regression. The author has shown that the dataset is a more robust
> replacement for Boston. Ames is a 2011 regression dataset on housing prices
> and has more than 5 times the amount of training examples with over 7 times
> as many features (none of which are morally questionable).
>
> I welcome the community's thoughts on the matter.
>
> Thanks.
> -Tony
>
> Here's an article I wrote on the Boston dataset:
> https://www.linkedin.com/pulse/hidden-racism-data-
> science-g-anthony-reina?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_
> flagship3_feed%3Bmu67f2GSzj5xHMpSD6M00A%3D%3D
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170706/4207c695/attachment-0001.html>


More information about the scikit-learn mailing list