Apologies if this seems obvious to others, but I'm using both functions from pandas and stats.spearmanr in different bits of my code and noticed something odd. Is the following output expected? from pandas import DataFrame from scipy import stats a = [1, nan, 2] b = [1, 2, 2] df = DataFrame(zip(a,b)) stats.spearmanr(a,b) gives: (0.86602540378443871, 0.3333333333333332) df.corr(method="spearman") 0 1 0 1 1 1 1 1 Removing the nan from a produces identical results. I had expected the first output, but perhaps I'm not understanding how scipy likes to handle nan. Any advice much appreciated. Regards, Ben
On Wed, Apr 4, 2012 at 3:54 PM, Ben <benwhalley@gmail.com> wrote:
Apologies if this seems obvious to others, but I'm using both functions from pandas and stats.spearmanr in different bits of my code and noticed something odd. Is the following output expected?
from pandas import DataFrame from scipy import stats a = [1, nan, 2] b = [1, 2, 2] df = DataFrame(zip(a,b)) stats.spearmanr(a,b)
gives: (0.86602540378443871, 0.3333333333333332)
df.corr(method="spearman") 0 1 0 1 1 1 1 1
Removing the nan from a produces identical results. I had expected the first output, but perhaps I'm not understanding how scipy likes to handle nan.
scipy.stats doesn't handle nans in most cases, they are just ignored (what the outcome is depends on the implementation details) the correct answer should be in stats.mstats, which uses masked arrays to handle nan cases
am = np.ma.fix_invalid(a) bm = np.ma.fix_invalid(b) stats.mstats.spearmanr(am, bm) (1.0, 0.0)
Josef
Any advice much appreciated.
Regards,
Ben
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
On Wed, Apr 4, 2012 at 6:34 PM, <josef.pktd@gmail.com> wrote:
On Wed, Apr 4, 2012 at 3:54 PM, Ben <benwhalley@gmail.com> wrote:
Apologies if this seems obvious to others, but I'm using both functions from pandas and stats.spearmanr in different bits of my code and noticed something odd. Is the following output expected?
from pandas import DataFrame from scipy import stats a = [1, nan, 2] b = [1, 2, 2] df = DataFrame(zip(a,b)) stats.spearmanr(a,b)
gives: (0.86602540378443871, 0.3333333333333332)
df.corr(method="spearman") 0 1 0 1 1 1 1 1
Removing the nan from a produces identical results. I had expected the first output, but perhaps I'm not understanding how scipy likes to handle nan.
scipy.stats doesn't handle nans in most cases, they are just ignored (what the outcome is depends on the implementation details)
the correct answer should be in stats.mstats, which uses masked arrays to handle nan cases
am = np.ma.fix_invalid(a) bm = np.ma.fix_invalid(b) stats.mstats.spearmanr(am, bm) (1.0, 0.0)
Josef
Any advice much appreciated.
Regards,
Ben
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
pandas excludes NaN's by default so the output looks correct based on what Josef wrote
participants (3)
-
Ben -
josef.pktd@gmail.com -
Wes McKinney