[Scipy-svn] r4667 - in trunk/scipy/cluster: . tests
scipy-svn at scipy.org
scipy-svn at scipy.org
Sat Aug 23 13:56:26 EDT 2008
Author: damian.eads
Date: 2008-08-23 12:56:19 -0500 (Sat, 23 Aug 2008)
New Revision: 4667
Modified:
trunk/scipy/cluster/distance.py
trunk/scipy/cluster/hierarchy.py
trunk/scipy/cluster/tests/test_distance.py
trunk/scipy/cluster/tests/test_hierarchy.py
Log:
Converted the scipy.cluster.hierarchy header so it conforms to RST. Fixed minor bug in test program when moving some distance-related functions from scipy.cluster.hierarchy to scipy.cluster.distance.
Modified: trunk/scipy/cluster/distance.py
===================================================================
--- trunk/scipy/cluster/distance.py 2008-08-23 05:21:05 UTC (rev 4666)
+++ trunk/scipy/cluster/distance.py 2008-08-23 17:56:19 UTC (rev 4667)
@@ -16,7 +16,8 @@
+------------------+-------------------------------------------------+
Predicates for checking the validity of distance matrices, both
-condensed and redundant.
+condensed and redundant. Also contained in this module are functions
+for computing the number of observations in a distance matrix.
+------------------+-------------------------------------------------+
|*Function* | *Description* |
@@ -25,6 +26,8 @@
+------------------+-------------------------------------------------+
|is_valid_y | checks for a valid condensed distance matrix.
+------------------+-------------------------------------------------+
+|numobs_dm # of observations in a distance matrix.
++------------------+-------------------------------------------------+
Distance functions between two vectors ``u`` and ``v``. Computing
distances over a large collection of vectors is inefficient for these
Modified: trunk/scipy/cluster/hierarchy.py
===================================================================
--- trunk/scipy/cluster/hierarchy.py 2008-08-23 05:21:05 UTC (rev 4666)
+++ trunk/scipy/cluster/hierarchy.py 2008-08-23 17:56:19 UTC (rev 4667)
@@ -1,107 +1,152 @@
"""
------------------------------------------
-Hierarchical Clustering Library for Scipy
- Copyright (C) Damian Eads, 2007-2008.
- New BSD License
------------------------------------------
+Function Reference
+------------------
-Flat cluster formation
+These functions cut hierarchical clusterings into flat clusterings
+or find the roots of the forest formed by a cut by providing the flat
+cluster ids of each observation.
- fcluster forms flat clusters from hierarchical clusters.
- fclusterdata forms flat clusters directly from data.
- leaders singleton root nodes for flat cluster.
++------------------+-------------------------------------------------+
+|*Function* | *Description* |
++------------------+-------------------------------------------------+
+|fcluster |forms flat clusters from hierarchical clusters. |
++------------------+-------------------------------------------------+
+|fclusterdata |forms flat clusters directly from data. |
++------------------+-------------------------------------------------+
+|leaders |singleton root nodes for flat cluster. |
++------------------+-------------------------------------------------+
-Agglomerative cluster formation
+These are routines for agglomerative clustering.
- linkage agglomeratively clusters original observations.
- single the single/min/nearest algorithm. (alias)
- complete the complete/max/farthest algorithm. (alias)
- average the average/UPGMA algorithm. (alias)
- weighted the weighted/WPGMA algorithm. (alias)
- centroid the centroid/UPGMC algorithm. (alias)
- median the median/WPGMC algorithm. (alias)
- ward the Ward/incremental algorithm. (alias)
++------------------+-------------------------------------------------+
+|*Function* | *Description* |
++------------------+-------------------------------------------------+
+|linkage |agglomeratively clusters original observations. |
++------------------+-------------------------------------------------+
+|single |the single/min/nearest algorithm. (alias) |
++------------------+-------------------------------------------------+
+|complete |the complete/max/farthest algorithm. (alias) |
++------------------+-------------------------------------------------+
+|average |the average/UPGMA algorithm. (alias) |
++------------------+-------------------------------------------------+
+|weighted |the weighted/WPGMA algorithm. (alias) |
++------------------+-------------------------------------------------+
+|centroid |the centroid/UPGMC algorithm. (alias) |
++------------------+-------------------------------------------------+
+|median |the median/WPGMC algorithm. (alias) |
++------------------+-------------------------------------------------+
+|ward |the Ward/incremental algorithm. (alias) |
++------------------+-------------------------------------------------+
-Statistic computations on hierarchies
+These routines compute statistics on hierarchies.
- cophenet computes the cophenetic distance between leaves.
- from_mlab_linkage converts a linkage produced by MATLAB(TM).
- inconsistent the inconsistency coefficients for cluster.
- maxinconsts the maximum inconsistency coefficient for each cluster.
- maxdists the maximum distance for each cluster.
- maxRstat the maximum specific statistic for each cluster.
- to_mlab_linkage converts a linkage to one MATLAB(TM) can understand.
++------------------+-------------------------------------------------+
+|*Function* | *Description* |
++------------------+-------------------------------------------------+
+|cophenet |computes the cophenetic distance between leaves. |
++------------------+-------------------------------------------------+
+|from_mlab_linkage |converts a linkage produced by MATLAB(TM). |
++------------------+-------------------------------------------------+
+|inconsistent |the inconsistency coefficients for cluster. |
++------------------+-------------------------------------------------+
+|maxinconsts |the maximum inconsistency coefficient for each |
+| |cluster. |
++------------------+-------------------------------------------------+
+|maxdists |the maximum distance for each cluster. |
++------------------+-------------------------------------------------+
+|maxRstat |the maximum specific statistic for each cluster. |
++------------------+-------------------------------------------------+
+|to_mlab_linkage |converts a linkage to one MATLAB(TM) can |
+| |understand. |
++------------------+-------------------------------------------------+
-Visualization
+Routines for visualizing flat clusters.
- dendrogram visualizes linkages (requires matplotlib).
++------------------+-------------------------------------------------+
+|*Function* | *Description* |
++------------------+-------------------------------------------------+
+|dendrogram |visualizes linkages (requires matplotlib). |
++------------------+-------------------------------------------------+
-Tree representations of hierarchies
+These are data structures and routines for representing hierarchies as
+tree objects.
- cnode represents cluster nodes in a cluster hierarchy.
- lvlist a left-to-right traversal of the leaves.
- totree represents a linkage matrix as a tree object.
++------------------+-------------------------------------------------+
+|*Function* | *Description* |
++------------------+-------------------------------------------------+
+|cnode |represents cluster nodes in a cluster hierarchy. |
++------------------+-------------------------------------------------+
+|lvlist |a left-to-right traversal of the leaves. |
++------------------+-------------------------------------------------+
+|totree |represents a linkage matrix as a tree object. |
++------------------+-------------------------------------------------+
-Predicates
+These are predicates for checking the validity of linkage and
+inconsistency matrices, both condensed and redundant.
- is_valid_im checks for a valid inconsistency matrix.
- is_valid_linkage checks for a valid hierarchical clustering.
- is_isomorphic checks if two flat clusterings are isomorphic.
- is_monotonic checks if a linkage is monotonic.
- Z_y_correspond checks for validity of distance matrix given a linkage.
++------------------+-------------------------------------------------+
+|*Function* | *Description* |
++------------------+-------------------------------------------------+
+|is_valid_im |checks for a valid inconsistency matrix. |
++------------------+-------------------------------------------------+
+|is_valid_linkage |checks for a valid hierarchical clustering. |
++------------------+-------------------------------------------------+
+|is_isomorphic |checks if two flat clusterings are isomorphic. |
++------------------+-------------------------------------------------+
+|is_monotonic |checks if a linkage is monotonic. |
++------------------+-------------------------------------------------+
+|Z_y_correspond |checks for validity of distance matrix given a |
+| |linkage. |
++------------------+-------------------------------------------------+
-Utility Functions
- numobs_dm # of observations in a distance matrix.
+* MATLAB and MathWorks are registered trademarks of The MathWorks, Inc.
-Legal stuff
+* Mathematica is a registered trademark of The Wolfram Research, Inc.
- copying Displays the license for this package.
+References
+----------
- MATLAB and MathWorks are registered trademarks of The MathWorks, Inc.
- Mathematica is a registered trademark of The Wolfram Research, Inc.
+.. [Sta07] "Statistics toolbox." API Reference Documentation. The MathWorks.
+ http://www.mathworks.com/access/helpdesk/help/toolbox/stats/.
+ Accessed October 1, 2007.
-References:
+.. [Mti07] "Hierarchical clustering." API Reference Documentation.
+ The Wolfram Research, Inc.
+ http://reference.wolfram.com/mathematica/HierarchicalClustering/tutorial/HierarchicalClustering.html.
+ Accessed October 1, 2007.
- [1] "Statistics toolbox." API Reference Documentation. The MathWorks.
- http://www.mathworks.com/access/helpdesk/help/toolbox/stats/.
- Accessed October 1, 2007.
+.. [Gow69] Gower, JC and Ross, GJS. "Minimum Spanning Trees and Single Linkage
+ Cluster Analysis." Applied Statistics. 18(1): pp. 54--64. 1969.
- [2] "Hierarchical clustering." API Reference Documentation.
- The Wolfram Research, Inc. http://reference.wolfram.com/...
- ...mathematica/HierarchicalClustering/tutorial/...
- HierarchicalClustering.html. Accessed October 1, 2007.
+.. [War63] Ward Jr, JH. "Hierarchical grouping to optimize an objective
+ function." Journal of the American Statistical Association. 58(301):
+ pp. 236--44. 1963.
- [3] Gower, JC and Ross, GJS. "Minimum Spanning Trees and Single Linkage
- Cluster Analysis." Applied Statistics. 18(1): pp. 54--64. 1969.
+.. [Joh66] Johnson, SC. "Hierarchical clustering schemes." Psychometrika.
+ 32(2): pp. 241--54. 1966.
- [4] Ward Jr, JH. "Hierarchical grouping to optimize an objective
- function." Journal of the American Statistical Association. 58(301):
- pp. 236--44. 1963.
+.. [Sne62] Sneath, PH and Sokal, RR. "Numerical taxonomy." Nature. 193: pp.
+ 855--60. 1962.
- [5] Johnson, SC. "Hierarchical clustering schemes." Psychometrika.
- 32(2): pp. 241--54. 1966.
+.. [Bat95] Batagelj, V. "Comparing resemblance measures." Journal of
+ Classification. 12: pp. 73--90. 1995.
- [6] Sneath, PH and Sokal, RR. "Numerical taxonomy." Nature. 193: pp.
- 855--60. 1962.
+.. [Sok58] Sokal, RR and Michener, CD. "A statistical method for evaluating
+ systematic relationships." Scientific Bulletins. 38(22):
+ pp. 1409--38. 1958.
- [7] Batagelj, V. "Comparing resemblance measures." Journal of
- Classification. 12: pp. 73--90. 1995.
+.. [Ede79] Edelbrock, C. "Mixture model tests of hierarchical clustering
+ algorithms: the problem of classifying everybody." Multivariate
+ Behavioral Research. 14: pp. 367--84. 1979.
- [8] Sokal, RR and Michener, CD. "A statistical method for evaluating
- systematic relationships." Scientific Bulletins. 38(22):
- pp. 1409--38. 1958.
+.. [Jai88] Jain, A., and Dubes, R., "Algorithms for Clustering Data."
+ Prentice-Hall. Englewood Cliffs, NJ. 1988.
- [9] Edelbrock, C. "Mixture model tests of hierarchical clustering
- algorithms: the problem of classifying everybody." Multivariate
- Behavioral Research. 14: pp. 367--84. 1979.
+.. [Fis36] Fisher, RA "The use of multiple measurements in taxonomic
+ problems." Annals of Eugenics, 7(2): 179-188. 1936
-[10] Jain, A., and Dubes, R., "Algorithms for Clustering Data."
- Prentice-Hall. Englewood Cliffs, NJ. 1988.
-
-[11] Fisher, RA "The use of multiple measurements in taxonomic
- problems." Annals of Eugenics, 7(2): 179-188. 1936
"""
_copyingtxt="""
@@ -423,7 +468,7 @@
s = y.shape
if len(s) == 1:
- is_valid_y(y, throw=True, name='y')
+ distance.is_valid_y(y, throw=True, name='y')
d = np.ceil(np.sqrt(s[0] * 2))
if method not in _cpy_non_euclid_methods.keys():
raise ValueError("Valid methods when the raw observations are omitted are 'single', 'complete', 'weighted', and 'average'.")
@@ -719,7 +764,7 @@
Y = args[1]
Ys = Y.shape
- is_valid_y(Y, throw=True, name='Y')
+ distance.is_valid_y(Y, throw=True, name='Y')
z = zz.mean()
y = Y.mean()
Modified: trunk/scipy/cluster/tests/test_distance.py
===================================================================
--- trunk/scipy/cluster/tests/test_distance.py 2008-08-23 05:21:05 UTC (rev 4666)
+++ trunk/scipy/cluster/tests/test_distance.py 2008-08-23 17:56:19 UTC (rev 4667)
@@ -39,8 +39,8 @@
import numpy as np
from numpy.testing import *
-from scipy.cluster.hierarchy import linkage, from_mlab_linkage, numobs_dm, numobs_y, numobs_linkage
-from scipy.cluster.distance import squareform, pdist, matching, jaccard, dice, sokalsneath, rogerstanimoto, russellrao, yule
+from scipy.cluster.hierarchy import linkage, from_mlab_linkage, numobs_linkage
+from scipy.cluster.distance import squareform, pdist, matching, jaccard, dice, sokalsneath, rogerstanimoto, russellrao, yule, numobs_dm, numobs_y
#from scipy.cluster.hierarchy import pdist, euclidean
Modified: trunk/scipy/cluster/tests/test_hierarchy.py
===================================================================
--- trunk/scipy/cluster/tests/test_hierarchy.py 2008-08-23 05:21:05 UTC (rev 4666)
+++ trunk/scipy/cluster/tests/test_hierarchy.py 2008-08-23 17:56:19 UTC (rev 4667)
@@ -39,8 +39,8 @@
import numpy as np
from numpy.testing import *
-from scipy.cluster.hierarchy import linkage, from_mlab_linkage, numobs_dm, numobs_y, numobs_linkage, inconsistent
-from scipy.cluster.distance import squareform, pdist, matching, jaccard, dice, sokalsneath, rogerstanimoto, russellrao, yule
+from scipy.cluster.hierarchy import linkage, from_mlab_linkage, numobs_linkage, inconsistent
+from scipy.cluster.distance import squareform, pdist, matching, jaccard, dice, sokalsneath, rogerstanimoto, russellrao, yule, numobs_dm, numobs_y
_tdist = np.array([[0, 662, 877, 255, 412, 996],
[662, 0, 295, 468, 268, 400],
More information about the Scipy-svn
mailing list