[Scipy-svn] r4665 - trunk/scipy/cluster
scipy-svn at scipy.org
scipy-svn at scipy.org
Fri Aug 22 20:42:18 EDT 2008
Author: damian.eads
Date: 2008-08-22 19:42:16 -0500 (Fri, 22 Aug 2008)
New Revision: 4665
Modified:
trunk/scipy/cluster/distance.py
Log:
Converted the documentation to restructured text.
Modified: trunk/scipy/cluster/distance.py
===================================================================
--- trunk/scipy/cluster/distance.py 2008-08-22 06:38:49 UTC (rev 4664)
+++ trunk/scipy/cluster/distance.py 2008-08-23 00:42:16 UTC (rev 4665)
@@ -1,32 +1,90 @@
"""
+
+Function Reference
+------------------
+
Distance matrix computation from a collection of raw observation vectors
+stored in a rectangular array.
- pdist computes distances between each observation pair.
++------------------+-------------------------------------------------+
+|pdist | computes distances between observation pairs. |
++------------------+-------------------------------------------------+
-Distance functions between two vectors u and v
+Distance functions between two vectors ``u`` and ``v``. Computing
+distances over a large collection of vectors is inefficient for these
+functions. Use ``pdist`` for this purpose.
- braycurtis the Bray-Curtis distance.
- canberra the Canberra distance.
- chebyshev the Chebyshev distance.
- cityblock the Manhattan distance.
- correlation the Correlation distance.
- cosine the Cosine distance.
- dice the Dice dissimilarity (boolean).
- euclidean the Euclidean distance.
- hamming the Hamming distance (boolean).
- jaccard the Jaccard distance (boolean).
- kulsinski the Kulsinski distance (boolean).
- mahalanobis the Mahalanobis distance.
- matching the matching dissimilarity (boolean).
- minkowski the Minkowski distance.
- rogerstanimoto the Rogers-Tanimoto dissimilarity (boolean).
- russellrao the Russell-Rao dissimilarity (boolean).
- seuclidean the normalized Euclidean distance.
- sokalmichener the Sokal-Michener dissimilarity (boolean).
- sokalsneath the Sokal-Sneath dissimilarity (boolean).
- sqeuclidean the squared Euclidean distance.
- yule the Yule dissimilarity (boolean).
++------------------+-------------------------------------------------+
+|braycurtis | the Bray-Curtis distance. |
+|canberra | the Canberra distance. |
+|chebyshev | the Chebyshev distance. |
+|cityblock | the Manhattan distance. |
+|correlation | the Correlation distance. |
+|cosine | the Cosine distance. |
+|dice | the Dice dissimilarity (boolean). |
+|euclidean | the Euclidean distance. |
+|hamming | the Hamming distance (boolean). |
+|jaccard | the Jaccard distance (boolean). |
+|kulsinski | the Kulsinski distance (boolean). |
+|mahalanobis | the Mahalanobis distance. |
+|matching | the matching dissimilarity (boolean). |
+|minkowski | the Minkowski distance. |
+|rogerstanimoto | the Rogers-Tanimoto dissimilarity (boolean). |
+|russellrao | the Russell-Rao dissimilarity (boolean). |
+|seuclidean | the normalized Euclidean distance. |
+|sokalmichener | the Sokal-Michener dissimilarity (boolean). |
+|sokalsneath | the Sokal-Sneath dissimilarity (boolean). |
+|sqeuclidean | the squared Euclidean distance. |
+|yule | the Yule dissimilarity (boolean). |
++------------------+-------------------------------------------------+
+
+References
+----------
+
+.. [Sta07] "Statistics toolbox." API Reference Documentation. The MathWorks.
+ http://www.mathworks.com/access/helpdesk/help/toolbox/stats/.
+ Accessed October 1, 2007.
+
+.. [Mti07] "Hierarchical clustering." API Reference Documentation.
+ The Wolfram Research, Inc.
+ http://reference.wolfram.com/mathematica/HierarchicalClustering/tutorial/HierarchicalClustering.html.
+ Accessed October 1, 2007.
+
+.. [Gow69] Gower, JC and Ross, GJS. "Minimum Spanning Trees and Single Linkage
+ Cluster Analysis." Applied Statistics. 18(1): pp. 54--64. 1969.
+
+.. [War63] Ward Jr, JH. "Hierarchical grouping to optimize an objective
+ function." Journal of the American Statistical Association. 58(301):
+ pp. 236--44. 1963.
+
+.. [Joh66] Johnson, SC. "Hierarchical clustering schemes." Psychometrika.
+ 32(2): pp. 241--54. 1966.
+
+.. [Sne62] Sneath, PH and Sokal, RR. "Numerical taxonomy." Nature. 193: pp.
+ 855--60. 1962.
+
+.. [Bat95] Batagelj, V. "Comparing resemblance measures." Journal of
+ Classification. 12: pp. 73--90. 1995.
+
+.. [Sok58] Sokal, RR and Michener, CD. "A statistical method for evaluating
+ systematic relationships." Scientific Bulletins. 38(22):
+ pp. 1409--38. 1958.
+
+.. [Ede79] Edelbrock, C. "Mixture model tests of hierarchical clustering
+ algorithms: the problem of classifying everybody." Multivariate
+ Behavioral Research. 14: pp. 367--84. 1979.
+
+.. [Jai88] Jain, A., and Dubes, R., "Algorithms for Clustering Data."
+ Prentice-Hall. Englewood Cliffs, NJ. 1988.
+
+.. [Fis36] Fisher, RA "The use of multiple measurements in taxonomic
+ problems." Annals of Eugenics, 7(2): 179-188. 1936
+
+
+Copyright Notice
+----------------
+
Copyright (C) Damian Eads, 2007-2008. New BSD License.
"""
@@ -72,11 +130,24 @@
def minkowski(u, v, p):
"""
- d = minkowski(u, v, p)
+ Computes the Minkowski distance between two vectors ``u`` and ``v``,
+ defined as
- Returns the Minkowski distance between two vectors u and v,
+ .. math::
- ||u-v||_p = (\sum {|u_i - v_i|^p})^(1/p).
+ {||u-v||}_p = (\sum {|u_i - v_i|^p})^(1/p).
+
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
+ p : ndarray
+ The norm of the difference :math:`${||u-v||}_p$`.
+
+ :Returns:
+ d : double
+ The Minkowski distance between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -86,9 +157,22 @@
def euclidean(u, v):
"""
- d = euclidean(u, v)
+ Computes the Euclidean distance between two n-vectors ``u`` and ``v``,
+ which is defined as
- Computes the Euclidean distance between two n-vectors u and v, ||u-v||_2
+ .. math::
+
+ {||u-v||}_2
+
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
+
+ :Returns:
+ d : double
+ The Euclidean distance between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -97,10 +181,23 @@
def sqeuclidean(u, v):
"""
- d = sqeuclidean(u, v)
+ Computes the squared Euclidean distance between two n-vectors u and v,
+ which is defined as
- Computes the squared Euclidean distance between two n-vectors u and v,
- (||u-v||_2)^2.
+ .. math::
+
+ {||u-v||}_2^2.
+
+
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
+
+ :Returns:
+ d : double
+ The squared Euclidean distance between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -108,10 +205,22 @@
def cosine(u, v):
"""
- d = cosine(u, v)
+ Computes the Cosine distance between two n-vectors u and v, which
+ is defined as
- Computes the Cosine distance between two n-vectors u and v,
- (1-uv^T)/(||u||_2 * ||v||_2).
+ .. math::
+
+ \frac{1-uv^T}/\frac{||u||_2 ||v||_2}.
+
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
+
+ :Returns:
+ d : double
+ The Cosine distance between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -120,16 +229,26 @@
def correlation(u, v):
"""
- d = correlation(u, v)
+ Computes the correlation distance between two n-vectors ``u`` and
+ ``v``, which is defined as
- Computes the correlation distance between two n-vectors u and v,
+ .. math::
- 1 - (u - n|u|_1)(v - n|v|_1)^T
- --------------------------------- ,
- |(u - n|u|_1)|_2 |(v - n|v|_1)|^T
+ \frac{1 - (u - n{|u|}_1){(v - n{|v|}_1)}^T}
+ {{|(u - n{|u|}_1)|}_2 {|(v - n{|v|}_1)|}^T}
- where |*|_1 is the Manhattan norm and n is the common dimensionality
- of the vectors.
+ where :math:`$|*|_1$` is the Manhattan norm and ``n`` is the
+ common dimensionality of the vectors.
+
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
+
+ :Returns:
+ d : double
+ The correlation distance between vectors ``u`` and ``v``.
"""
umu = u.mean()
vmu = v.mean()
@@ -141,19 +260,28 @@
def hamming(u, v):
"""
- d = hamming(u, v)
+ Computes the Hamming distance between two n-vectors ``u`` and
+ ``v``, which is simply the proportion of disagreeing components in
+ ``u`` and ``v``. If ``u`` and ``v`` are boolean vectors, the Hamming
+ distance is
- Computes the Hamming distance between two n-vectors u and v,
- which is simply the proportion of disagreeing components in u
- and v. If u and v are boolean vectors, the hamming distance is
+ .. math:
- (c_{01} + c_{10}) / n
+ \frac{c_{01} + c_{10}}{n}
- where c_{ij} is the number of occurrences of
+ where :math:`$c_{ij}$` is the number of occurrences of
+ :math:`$\mathtt{u[k]}` = i$ and :math:`$\mathtt{v[k]} = j$` for
+ :math:`$k < n$`.
- u[k] == i and v[k] == j
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
- for k < n.
+ :Returns:
+ d : double
+ The Hamming distance between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -161,20 +289,27 @@
def jaccard(u, v):
"""
- d = jaccard(u, v)
+ Computes the Jaccard-Needham dissimilarity between two boolean
+ n-vectors u and v, which is
- Computes the Jaccard-Needham dissimilarity between two boolean
- n-vectors u and v, which is
+ .. math::
- c_{TF} + c_{FT}
- ------------------------
- c_{TT} + c_{FT} + c_{TF}
+ \frac{c_{TF} + c_{FT}}
+ {c_{TT} + c_{FT} + c_{TF}}
- where c_{ij} is the number of occurrences of
+ where :math:`$c_{ij}$` is the number of occurrences of
+ :math:`$\mathtt{u[k]}` = i$ and :math:`$\mathtt{v[k]} = j$` for
+ :math:`$k < n$`.
- u[k] == i and v[k] == j
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
- for k < n.
+ :Returns:
+ d : double
+ The Jaccard distance between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -184,20 +319,27 @@
def kulsinski(u, v):
"""
- d = kulsinski(u, v)
+ Computes the Kulsinski dissimilarity between two boolean n-vectors
+ u and v, which is defined as
- Computes the Kulsinski dissimilarity between two boolean n-vectors
- u and v, which is
+ .. math:
- c_{TF} + c_{FT} - c_{TT} + n
- ----------------------------
- c_{FT} + c_{TF} + n
+ \frac{c_{TF} + c_{FT} - c_{TT} + n}
+ {c_{FT} + c_{TF} + n}
- where c_{ij} is the number of occurrences of
+ where :math:`$c_{ij}$` is the number of occurrences of
+ :math:`$\mathtt{u[k]}` = i$ and :math:`$\mathtt{v[k]} = j$` for
+ :math:`$k < n$`.
- u[k] == i and v[k] == j
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
- for k < n.
+ :Returns:
+ d : double
+ The Kulsinski distance between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -208,11 +350,20 @@
def seuclidean(u, v, V):
"""
- d = seuclidean(u, v, V)
+ Returns the standardized Euclidean distance between two n-vectors
+ ``u`` and ``v``. ``V`` is an m-dimensional vector of component
+ variances. It is usually computed among a larger collection
+ vectors.
- Returns the standardized Euclidean distance between two
- n-vectors u and v. V is a m-dimensional vector of component
- variances. It is usually computed among a larger collection vectors.
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
+
+ :Returns:
+ d : double
+ The standardized Euclidean distance between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -223,10 +374,22 @@
def cityblock(u, v):
"""
- d = cityblock(u, v)
+ Computes the Manhattan distance between two n-vectors u and v,
+ which is defined as
- Computes the Manhattan distance between two n-vectors u and v,
- \sum {u_i-v_i}.
+ .. math:
+
+ \sum_i {u_i-v_i}.
+
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
+
+ :Returns:
+ d : double
+ The City Block distance between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -234,11 +397,23 @@
def mahalanobis(u, v, VI):
"""
- d = mahalanobis(u, v, VI)
+ Computes the Mahalanobis distance between two n-vectors ``u`` and ``v``,
+ which is defiend as
- Computes the Mahalanobis distance between two n-vectors u and v,
- (u-v)VI(u-v)^T
- where VI is the inverse covariance matrix.
+ .. math:
+ (u-v)V^{-1}(u-v)^T
+
+ where ``VI`` is the inverse covariance matrix :math:`$V^{-1}$`.
+
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
+
+ :Returns:
+ d : double
+ The Mahalanobis distance between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -247,10 +422,21 @@
def chebyshev(u, v):
"""
- d = chebyshev(u, v)
+ Computes the Chebyshev distance between two n-vectors u and v,
+ which is defined as
- Computes the Chebyshev distance between two n-vectors u and v,
- \max {|u_i-v_i|}.
+ .. math:
+ \max_i {|u_i-v_i|}.
+
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
+
+ :Returns:
+ d : double
+ The Chebyshev distance between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -258,10 +444,22 @@
def braycurtis(u, v):
"""
- d = braycurtis(u, v)
+ Computes the Bray-Curtis distance between two n-vectors ``u`` and
+ ``v``, which is defined as
- Computes the Bray-Curtis distance between two n-vectors u and v,
- \sum{|u_i-v_i|} / \sum{|u_i+v_i|}.
+ .. math:
+
+ \sum{|u_i-v_i|} / \sum{|u_i+v_i|}.
+
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
+
+ :Returns:
+ d : double
+ The Bray-Curtis distance between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -269,10 +467,24 @@
def canberra(u, v):
"""
- d = canberra(u, v)
+ Computes the Canberra distance between two n-vectors u and v,
+ which is defined as
- Computes the Canberra distance between two n-vectors u and v,
- \sum{|u_i-v_i|} / \sum{|u_i|+|v_i}.
+ .. math:
+
+ \frac{\sum_i {|u_i-v_i|}}
+ {\sum_i {|u_i|+|v_i|}}.
+
+
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
+
+ :Returns:
+ d : double
+ The Canberra distance between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -318,20 +530,28 @@
def yule(u, v):
"""
- d = yule(u, v)
- Computes the Yule dissimilarity between two boolean n-vectors u and v,
+ Computes the Yule dissimilarity between two boolean n-vectors u and v,
+ which is defined as
- R
- ---------------------
- c_{TT} + c_{FF} + R/2
- where c_{ij} is the number of occurrences of
+ .. math:
- u[k] == i and v[k] == j
+ \frac{R}
+ \frac{c_{TT} + c_{FF} + \frac{R}{2}}
- for k < n, and
+ where :math:`$c_{ij}$` is the number of occurrences of
+ :math:`$\mathtt{u[k]}` = i$ and :math:`$\mathtt{v[k]} = j$` for
+ :math:`$k < n$` and :math:`$R = 2.0 * (c_{TF} + c_{FT})$`.
- R = 2.0 * (c_{TF} + c_{FT}).
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
+
+ :Returns:
+ d : double
+ The Yule dissimilarity between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -341,18 +561,26 @@
def matching(u, v):
"""
- d = matching(u, v)
+ Computes the Matching dissimilarity between two boolean n-vectors
+ u and v, which is defined as
- Computes the Matching dissimilarity between two boolean n-vectors
- u and v, which is
+ .. math:
- (c_{TF} + c_{FT}) / n
+ \frac{c_{TF} + c_{FT}}{n}
- where c_{ij} is the number of occurrences of
+ where :math:`$c_{ij}$` is the number of occurrences of
+ :math:`$\mathtt{u[k]}` = i$ and :math:`$\mathtt{v[k]} = j$` for
+ :math:`$k < n$`.
- u[k] == i and v[k] == j
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
- for k < n.
+ :Returns:
+ d : double
+ The Matching dissimilarity between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -361,20 +589,27 @@
def dice(u, v):
"""
- d = dice(u, v)
+ Computes the Dice dissimilarity between two boolean n-vectors
+ ``u`` and ``v``, which is
- Computes the Dice dissimilarity between two boolean n-vectors
- u and v, which is
+ .. math:
- c_{TF} + c_{FT}
- ----------------------------
- 2 * c_{TT} + c_{FT} + c_{TF}
+ \frac{c_{TF} + c_{FT}
+ {2c_{TT} + c_{FT} + c_{TF}}
- where c_{ij} is the number of occurrences of
+ where :math:`$c_{ij}$` is the number of occurrences of
+ :math:`$\mathtt{u[k]}` = i$ and :math:`$\mathtt{v[k]} = j$` for
+ :math:`$k < n$`.
- u[k] == i and v[k] == j
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
- for k < n.
+ :Returns:
+ d : double
+ The Dice dissimilarity between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -387,23 +622,27 @@
def rogerstanimoto(u, v):
"""
- d = rogerstanimoto(u, v)
+ Computes the Rogers-Tanimoto dissimilarity between two boolean
+ n-vectors ``u`` and ``v``, which is defined as
- Computes the Rogers-Tanimoto dissimilarity between two boolean
- n-vectors u and v,
+ .. math:
+ \frac{R}
+ {c_{TT} + c_{FF} + R}
- R
- -------------------
- c_{TT} + c_{FF} + R
+ where :math:`$c_{ij}$` is the number of occurrences of
+ :math:`$\mathtt{u[k]}` = i$ and :math:`$\mathtt{v[k]} = j$` for
+ :math:`$k < n$` and :math:`$R = 2(c_{TF} + c_{FT})$`.
- where c_{ij} is the number of occurrences of
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
- u[k] == i and v[k] == j
-
- for k < n, and
-
- R = 2.0 * (c_{TF} + c_{FT}).
-
+ :Returns:
+ d : double
+ The Rogers-Tanimoto dissimilarity between vectors
+ ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -412,11 +651,27 @@
def russellrao(u, v):
"""
- d = russellrao(u, v)
+ Computes the Russell-Rao dissimilarity between two boolean n-vectors
+ ``u`` and ``v``, which is defined as
- Computes the Russell-Rao dissimilarity between two boolean n-vectors
- u and v, (n - c_{TT}) / n where c_{ij} is the number of occurrences
- of u[k] == i and v[k] == j for k < n.
+ .. math:
+
+ \frac{n - c_{TT}}
+ {n}
+
+ where :math:`$c_{ij}$` is the number of occurrences of
+ :math:`$\mathtt{u[k]}` = i$ and :math:`$\mathtt{v[k]} = j$` for
+ :math:`$k < n$`.
+
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
+
+ :Returns:
+ d : double
+ The Russell-Rao dissimilarity between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -428,12 +683,28 @@
def sokalmichener(u, v):
"""
- d = sokalmichener(u, v)
+ Computes the Sokal-Michener dissimilarity between two boolean vectors
+ ``u`` and ``v``, which is defined as
- Computes the Sokal-Michener dissimilarity between two boolean vectors
- u and v, 2R / (S + 2R) where c_{ij} is the number of occurrences of
- u[k] == i and v[k] == j for k < n and R = 2 * (c_{TF} + c{FT}) and
- S = c_{FF} + c_{TT}.
+ .. math:
+
+ \frac{2R}
+ {S + 2R}
+
+ where :math:`$c_{ij}$` is the number of occurrences of
+ :math:`$\mathtt{u[k]}` = i$ and :math:`$\mathtt{v[k]} = j$` for
+ :math:`$k < n$`, :math:`$R = 2 * (c_{TF} + c{FT})$` and
+ :math:`$S = c_{FF} + c_{TT}$`.
+
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
+
+ :Returns:
+ d : double
+ The Sokal-Michener dissimilarity between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -448,11 +719,27 @@
def sokalsneath(u, v):
"""
- d = sokalsneath(u, v)
+ Computes the Sokal-Sneath dissimilarity between two boolean vectors
+ ``u`` and ``v``,
- Computes the Sokal-Sneath dissimilarity between two boolean vectors
- u and v, 2R / (c_{TT} + 2R) where c_{ij} is the number of occurrences
- of u[k] == i and v[k] == j for k < n and R = 2 * (c_{TF} + c{FT}).
+ .. math:
+
+ \frac{2R}
+ {c_{TT} + 2R}
+
+ where :math:`$c_{ij}$` is the number of occurrences of
+ :math:`$\mathtt{u[k]}` = i$ and :math:`$\mathtt{v[k]} = j$` for
+ :math:`$k < n$` and :math:`$R = 2(c_{TF} + c{FT})$`.
+
+ :Parameters:
+ u : ndarray
+ An :math:`n`-dimensional vector.
+ v : ndarray
+ An :math:`n`-dimensional vector.
+
+ :Returns:
+ d : double
+ The Sokal-Sneath dissimilarity between vectors ``u`` and ``v``.
"""
u = np.asarray(u)
v = np.asarray(v)
@@ -465,176 +752,211 @@
def pdist(X, metric='euclidean', p=2, V=None, VI=None):
- """ Y = pdist(X, method='euclidean', p=2)
+ """
+ Computes the distance between m original observations in
+ n-dimensional space. Returns a condensed distance matrix Y. For
+ each :math:`$i$` and :math:`$j$` (where :math:`$i<j<n$), the
+ metric ``dist(u=X[i], v=X[j])`` is computed and stored in the
+ :math:`ij`th entry.
- Computes the distance between m original observations in
- n-dimensional space. Returns a condensed distance matrix Y.
- For each i and j (i<j), the metric dist(u=X[i], v=X[j]) is
- computed and stored in the ij'th entry. See squareform
- to learn how to retrieve this entry.
+ See ``squareform`` for information on how to calculate the index of
+ this entry or to convert the condensed distance matrix to a
+ redundant square matrix.
- 1. Y = pdist(X)
+ :Parameters:
+ X : ndarray
+ An m by n array of m original observations in an
+ n-dimensional space.
+ metric : string or function
+ The distance metric to use. The distance function can
+ be 'braycurtis', 'canberra', 'chebyshev', 'cityblock',
+ 'correlation', 'cosine', 'dice', 'euclidean', 'hamming',
+ 'jaccard', 'kulsinski', 'mahalanobis', 'matching',
+ 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean',
+ 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
- Computes the distance between m points using Euclidean distance
- (2-norm) as the distance metric between the points. The points
- are arranged as m n-dimensional row vectors in the matrix X.
+ :Returns:
+ Y : ndarray
+ A condensed distance matrix.
- 2. Y = pdist(X, 'minkowski', p)
+ Calling Conventions
+ -------------------
- Computes the distances using the Minkowski distance ||u-v||_p
- (p-norm) where p>=1.
+ 1. ``Y = pdist(X, 'euclidean')``
- 3. Y = pdist(X, 'cityblock')
+ Computes the distance between m points using Euclidean distance
+ (2-norm) as the distance metric between the points. The points
+ are arranged as m n-dimensional row vectors in the matrix X.
- Computes the city block or Manhattan distance between the
- points.
+ 2. ``Y = pdist(X, 'minkowski', p)``
- 4. Y = pdist(X, 'seuclidean', V=None)
+ Computes the distances using the Minkowski distance
+ :math:`$||u-v||_p$` (p-norm) where :math:`$p \geq 1$`.
- Computes the standardized Euclidean distance. The standardized
- Euclidean distance between two n-vectors u and v is
+ 3. ``Y = pdist(X, 'cityblock')``
- sqrt(\sum {(u_i-v_i)^2 / V[x_i]}).
+ Computes the city block or Manhattan distance between the
+ points.
- V is the variance vector; V[i] is the variance computed over all
+ 4. ``Y = pdist(X, 'seuclidean', V=None)``
+
+ Computes the standardized Euclidean distance. The standardized
+ Euclidean distance between two n-vectors ``u`` and ``v`` is
+
+ .. math:
+
+ sqrt(\sum {(u_i-v_i)^2 / V[x_i]}).
+
+ V is the variance vector; V[i] is the variance computed over all
the i'th components of the points. If not passed, it is
automatically computed.
- 5. Y = pdist(X, 'sqeuclidean')
+ 5. ``Y = pdist(X, 'sqeuclidean')``
- Computes the squared Euclidean distance ||u-v||_2^2 between
- the vectors.
+ Computes the squared Euclidean distance ||u-v||_2^2 between
+ the vectors.
- 6. Y = pdist(X, 'cosine')
+ 6. ``Y = pdist(X, 'cosine')``
- Computes the cosine distance between vectors u and v,
+ Computes the cosine distance between vectors u and v,
- 1 - uv^T
- -----------
- |u|_2 |v|_2
+ .. math:
- where |*|_2 is the 2 norm of its argument *.
+ \frac{1 - uv^T}
+ {{|u|}_2 {|v|}_2}
- 7. Y = pdist(X, 'correlation')
+ where |*|_2 is the 2 norm of its argument *.
- Computes the correlation distance between vectors u and v. This is
+ 7. ``Y = pdist(X, 'correlation')``
- 1 - (u - n|u|_1)(v - n|v|_1)^T
- --------------------------------- ,
- |(u - n|u|_1)|_2 |(v - n|v|_1)|^T
+ Computes the correlation distance between vectors u and v. This is
- where |*|_1 is the Manhattan (or 1-norm) of its argument *,
- and n is the common dimensionality of the vectors.
+ .. math:
- 8. Y = pdist(X, 'hamming')
+ \frac{1 - (u - n{|u|}_1){(v - n{|v|}_1)}^T}
+ {{|(u - n{|u|}_1)|}_2 {|(v - n{|v|}_1)|}^T}
- Computes the normalized Hamming distance, or the proportion
- of those vector elements between two n-vectors u and v which
- disagree. To save memory, the matrix X can be of type boolean.
+ where :math:`$|*|_1$` is the Manhattan (or 1-norm) of its
+ argument, and :math:`$n$` is the common dimensionality of the
+ vectors.
- 9. Y = pdist(X, 'jaccard')
+ 8. ``Y = pdist(X, 'hamming')``
- Computes the Jaccard distance between the points. Given two
- vectors, u and v, the Jaccard distance is the proportion of
- those elements u_i and v_i that disagree where at least one
- of them is non-zero.
+ Computes the normalized Hamming distance, or the proportion of
+ those vector elements between two n-vectors ``u`` and ``v``
+ which disagree. To save memory, the matrix ``X`` can be of type
+ boolean.
- 10. Y = pdist(X, 'chebyshev')
+ 9. ``Y = pdist(X, 'jaccard')``
- Computes the Chebyshev distance between the points. The
- Chebyshev distance between two n-vectors u and v is the maximum
- norm-1 distance between their respective elements. More
- precisely, the distance is given by
+ Computes the Jaccard distance between the points. Given two
+ vectors, ``u`` and ``v``, the Jaccard distance is the
+ proportion of those elements ``u[i]`` and ``v[i]`` that
+ disagree where at least one of them is non-zero.
- d(u,v) = max {|u_i-v_i|}.
+ 10. ``Y = pdist(X, 'chebyshev')``
- 11. Y = pdist(X, 'canberra')
+ Computes the Chebyshev distance between the points. The
+ Chebyshev distance between two n-vectors ``u`` and ``v`` is the
+ maximum norm-1 distance between their respective elements. More
+ precisely, the distance is given by
- Computes the Canberra distance between the points. The
- Canberra distance between two points u and v is
+ .. math:
- |u_1-v_1| |u_2-v_2| |u_n-v_n|
- d(u,v) = ----------- + ----------- + ... + -----------
- |u_1|+|v_1| |u_2|+|v_2| |u_n|+|v_n|
+ d(u,v) = max_i {|u_i-v_i|}.
- 12. Y = pdist(X, 'braycurtis')
+ 11. ``Y = pdist(X, 'canberra')``
- Computes the Bray-Curtis distance between the points. The
- Bray-Curtis distance between two points u and v is
+ Computes the Canberra distance between the points. The
+ Canberra distance between two points ``u`` and ``v`` is
- |u_1-v_1| + |u_2-v_2| + ... + |u_n-v_n|
- d(u,v) = ---------------------------------------
- |u_1+v_1| + |u_2+v_2| + ... + |u_n+v_n|
+ .. math:
- 13. Y = pdist(X, 'mahalanobis', VI=None)
+ d(u,v) = \sum_u {|u_i-v_i|}
+ {|u_i|+|v_i|}
+
- Computes the Mahalanobis distance between the points. The
- Mahalanobis distance between two points u and v is
- (u-v)(1/V)(u-v)^T
- where (1/V) is the inverse covariance. If VI is not None,
- VI will be used as the inverse covariance matrix.
+ 12. ``Y = pdist(X, 'braycurtis')``
- 14. Y = pdist(X, 'yule')
+ Computes the Bray-Curtis distance between the points. The
+ Bray-Curtis distance between two points ``u`` and ``v`` is
- Computes the Yule distance between each pair of boolean
- vectors. (see yule function documentation)
- 15. Y = pdist(X, 'matching')
+ .. math:
- Computes the matching distance between each pair of boolean
- vectors. (see matching function documentation)
+ d(u,v) = \frac{\sum_i {u_i-v_i}}
+ {\sum_i {u_i+v_i}}
- 16. Y = pdist(X, 'dice')
+ 13. ``Y = pdist(X, 'mahalanobis', VI=None)``
- Computes the Dice distance between each pair of boolean
- vectors. (see dice function documentation)
+ Computes the Mahalanobis distance between the points. The
+ Mahalanobis distance between two points ``u`` and ``v`` is
+ :math:`$(u-v)(1/V)(u-v)^T$` where :math:`$(1/V)$` (the ``VI``
+ variable) is the inverse covariance. If ``VI`` is not None,
+ ``VI`` will be used as the inverse covariance matrix.
- 17. Y = pdist(X, 'kulsinski')
+ 14. ``Y = pdist(X, 'yule')``
- Computes the Kulsinski distance between each pair of
- boolean vectors. (see kulsinski function documentation)
+ Computes the Yule distance between each pair of boolean
+ vectors. (see yule function documentation)
- 17. Y = pdist(X, 'rogerstanimoto')
+ 15. ``Y = pdist(X, 'matching')``
- Computes the Rogers-Tanimoto distance between each pair of
- boolean vectors. (see rogerstanimoto function documentation)
+ Computes the matching distance between each pair of boolean
+ vectors. (see matching function documentation)
- 18. Y = pdist(X, 'russellrao')
+ 16. ``Y = pdist(X, 'dice')``
- Computes the Russell-Rao distance between each pair of
- boolean vectors. (see russellrao function documentation)
+ Computes the Dice distance between each pair of boolean
+ vectors. (see dice function documentation)
- 19. Y = pdist(X, 'sokalmichener')
+ 17. ``Y = pdist(X, 'kulsinski')``
- Computes the Sokal-Michener distance between each pair of
- boolean vectors. (see sokalmichener function documentation)
+ Computes the Kulsinski distance between each pair of
+ boolean vectors. (see kulsinski function documentation)
- 20. Y = pdist(X, 'sokalsneath')
+ 18. ``Y = pdist(X, 'rogerstanimoto')``
- Computes the Sokal-Sneath distance between each pair of
- boolean vectors. (see sokalsneath function documentation)
+ Computes the Rogers-Tanimoto distance between each pair of
+ boolean vectors. (see rogerstanimoto function documentation)
- 21. Y = pdist(X, f)
+ 19. ``Y = pdist(X, 'russellrao')``
- Computes the distance between all pairs of vectors in X
- using the user supplied 2-arity function f. For example,
- Euclidean distance between the vectors could be computed
- as follows,
+ Computes the Russell-Rao distance between each pair of
+ boolean vectors. (see russellrao function documentation)
- dm = pdist(X, (lambda u, v: np.sqrt(((u-v)*(u-v).T).sum())))
+ 20. ``Y = pdist(X, 'sokalmichener')``
- Note that you should avoid passing a reference to one of
- the distance functions defined in this library. For example,
+ Computes the Sokal-Michener distance between each pair of
+ boolean vectors. (see sokalmichener function documentation)
- dm = pdist(X, sokalsneath)
+ 21. ``Y = pdist(X, 'sokalsneath')``
- would calculate the pair-wise distances between the vectors
- in X using the Python function sokalsneath. This would result
- in sokalsneath being called {n \choose 2} times, which is
- inefficient. Instead, the optimized C version is more
- efficient, and we call it using the following syntax.
+ Computes the Sokal-Sneath distance between each pair of
+ boolean vectors. (see sokalsneath function documentation)
- dm = pdist(X, 'sokalsneath')
+ 22. ``Y = pdist(X, f)``
+
+ Computes the distance between all pairs of vectors in X
+ using the user supplied 2-arity function f. For example,
+ Euclidean distance between the vectors could be computed
+ as follows::
+
+ dm = pdist(X, (lambda u, v: np.sqrt(((u-v)*(u-v).T).sum())))
+
+ Note that you should avoid passing a reference to one of
+ the distance functions defined in this library. For example,::
+
+ dm = pdist(X, sokalsneath)
+
+ would calculate the pair-wise distances between the vectors in
+ X using the Python function sokalsneath. This would result in
+ sokalsneath being called :math:`${n \choose 2}$` times, which
+ is inefficient. Instead, the optimized C version is more
+ efficient, and we call it using the following syntax.::
+
+ dm = pdist(X, 'sokalsneath')
+
"""
# 21. Y = pdist(X, 'test_Y')
#
More information about the Scipy-svn
mailing list