Dear all,

As suggested in this github issue (https://github.com/scipy/scipy/issues/3870), I would like to discuss the merit of introducing a new function nanpdist into scipy.spatial. I have also brought up the problem in the following previous e-mail (http://comments.gmane.org/gmane.comp.python.scientific.devel/18956) and on SO (http://stackoverflow.com/questions/24781461/compute-the-pairwise-distance-in-scipy-with-missing-values).

Warren suggested three ways to tackle this problem:
  1. Don't change anything--the users should clean up their data!
  2. nanpdist
  3. Add a keyword argument to pdist that determines how nan should be treated.

Clearly, I don't favor the first option since I believe missing values can be important pieces of information, too. I slightly tend towards option two because adding a keyword will further complicate an already very long pdist function.

I'm happy to submit a pull request if there is a consensus that something should be done.

Best,

Moritz