[Numpy-svn] r4804 - branches/maskedarray/numpy/ma

Wed Feb 13 20:32:38 EST 2008

Author: pierregm
Date: 2008-02-13 19:32:35 -0600 (Wed, 13 Feb 2008)
New Revision: 4804

Modified:
   branches/maskedarray/numpy/ma/API_CHANGES.txt
   branches/maskedarray/numpy/ma/extras.py
   branches/maskedarray/numpy/ma/morestats.py
   branches/maskedarray/numpy/ma/mstats.py
Log:
numpy.ma : docs + API_CHANGES.txt updates

Modified: branches/maskedarray/numpy/ma/API_CHANGES.txt
===================================================================

--- branches/maskedarray/numpy/ma/API_CHANGES.txt	2008-02-14 01:32:20 UTC (rev 4803)
+++ branches/maskedarray/numpy/ma/API_CHANGES.txt	2008-02-14 01:32:35 UTC (rev 4804)
@@ -4,6 +4,52 @@
 API changes in the new masked array implementation
 ==================================================
 
+Masked arrays are subclasses of ndarray
+---------------------------------------
+
+Contrary to the original implementation, masked arrays are now regular ndarrays::
+
+  >>> x = masked_array([1,2,3],mask=[0,0,1])
+  >>> print isinstance(x, numpy.ndarray)
+  True
+
+
+``_data`` returns a view of the masked array
+--------------------------------------------
+
+Masked arrays are composed of a ``_data`` part and a ``_mask``. Accessing the
+``_data`` part will return a regular ndarray or any of its subclass, depending
+on the initial data::
+
+  >>> x = masked_array(numpy.matrix([[1,2],[3,4]]),mask=[[0,0],[0,1]])
+  >>> print x._data
+  [[1 2]
+   [3 4]]
+  >>> print type(x._data)
+  <class 'numpy.core.defmatrix.matrix'>
+
+
+In practice, ``_data`` is implemented as a property, not as an attribute.
+Therefore, you cannot access it directly, and some simple tests such as the
+following one will fail::
+
+  >>>x._data is x._data
+  False
+
+
+``filled(x)`` can return a subclass of ndarray
+-------------
+The function ``filled(a)`` returns an array of the same type as ``a._data``::
+
+  >>> x = masked_array(numpy.matrix([[1,2],[3,4]]),mask=[[0,0],[0,1]])
+  >>> y = filled(x)
+  >>> print type(y)
+  <class 'numpy.core.defmatrix.matrix'>
+  >>> print y
+  matrix([[     1,      2],
+          [     3, 999999]])
+
+
 ``put``, ``putmask`` behave like their ndarray counterparts
 -----------------------------------------------------------
 
@@ -66,3 +112,27 @@
     File "<stdin>", line 1, in <module>
   ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
 
+
+==================================
+New features (non exhaustive list)
+==================================
+
+``mr_``
+-------
+
+``mr_`` mimics the behavior of ``r_`` for masked arrays::
+
+``anom``
+--------
+
+The ``anom`` method returns the deviations from the average (anomalies).
+
+``varu`` and ``stdu``
+---------------------
+
+These methods return unbiased estimates of the variance and standard deviation
+respectively. An unbiased estimate is obtained by dividing the sum of the
+squared anomalies by ``n-1`` instead of ``n`` for the biased estimates, where
+``n`` is the number of unmasked elements along the given axis.
+
+

Modified: branches/maskedarray/numpy/ma/extras.py
===================================================================
--- branches/maskedarray/numpy/ma/extras.py	2008-02-14 01:32:20 UTC (rev 4803)
+++ branches/maskedarray/numpy/ma/extras.py	2008-02-14 01:32:35 UTC (rev 4804)
@@ -219,6 +219,20 @@
     """Execute func1d(arr[i],*args) where func1d takes 1-D arrays and
     arr is an N-d array.  i varies so as to apply the function along
     the given axis for each 1-d subarray in arr.
+    
+    Parameters
+    ----------
+        func1d : function
+            The 1D function to apply on the given axis.
+        axis : int
+            Axis along which to apply the function.
+        arr : ndarray
+            Array on which the function is applied.
+        args : list
+            Additional input parameters to func1d.
+        kwargs : dictionary
+            Additional optional parameters to func1d.
+    
     """
     arr = core.array(arr, copy=False, subok=True)
     nd = arr.ndim
@@ -542,6 +556,7 @@
     Notes
     -----
         The first argument is not conjugated.
+        The function works only with 2D arrays at most.
 
     """
     #TODO: Works only with 2D arrays. There should be a way to get it to run with higher dimension

Modified: branches/maskedarray/numpy/ma/morestats.py
===================================================================
--- branches/maskedarray/numpy/ma/morestats.py	2008-02-14 01:32:20 UTC (rev 4803)
+++ branches/maskedarray/numpy/ma/morestats.py	2008-02-14 01:32:35 UTC (rev 4804)
@@ -40,22 +40,26 @@
     """Computes quantile estimates with the Harrell-Davis method, where the estimates
 are calculated as a weighted linear combination of order statistics.
 
-*Parameters* :
-    data: {ndarray}
+Parameters
+----------
+    data: ndarray
         Data array.
-    prob: {sequence}
+    prob: sequence
         Sequence of quantiles to compute.
-    axis : {integer}
+    axis : int
         Axis along which to compute the quantiles. If None, use a flattened array.
-    var : {boolean}
+    var : boolean
         Whether to return the variance of the estimate.
 
-*Returns*
+Returns
+-------
     A (p,) array of quantiles (if ``var`` is False), or a (2,p) array of quantiles
     and variances (if ``var`` is True), where ``p`` is the number of quantiles.
 
-:Note:
+Notes
+-----
     The function is restricted to 2D arrays.
+    
     """
     def _hd_1D(data,prob,var):
         "Computes the HD quantiles for a 1D array. Returns nan for invalid data."
@@ -102,13 +106,15 @@
 def hdmedian(data, axis=-1, var=False):
     """Returns the Harrell-Davis estimate of the median along the given axis.
 
-*Parameters* :
-    data: {ndarray}
+Parameters
+----------
+    data: ndarray
         Data array.
-    axis : {integer}
+    axis : int
         Axis along which to compute the quantiles. If None, use a flattened array.
-    var : {boolean}
+    var : boolean
         Whether to return the variance of the estimate.
+        
     """
     result = hdquantiles(data,[0.5], axis=axis, var=var)
     return result.squeeze()
@@ -119,16 +125,19 @@
     """Computes the standard error of the Harrell-Davis quantile estimates by jackknife.
 
 
-*Parameters* :
-    data: {ndarray}
+Parameters
+----------
+    data: ndarray
         Data array.
-    prob: {sequence}
+    prob: sequence
         Sequence of quantiles to compute.
-    axis : {integer}
+    axis : int
         Axis along which to compute the quantiles. If None, use a flattened array.
 
-*Note*:
+Notes
+-----
     The function is restricted to 2D arrays.
+    
     """
     def _hdsd_1D(data,prob):
         "Computes the std error for 1D arrays."
@@ -172,16 +181,18 @@
     """Returns the selected confidence interval of the trimmed mean along the
 given axis.
 
-*Parameters* :
-    data : {sequence}
+Parameters
+----------
+    data : sequence
         Input data. The data is transformed to a masked array
-    proportiontocut : {float}
+    proportiontocut : float
         Proportion of the data to cut from each side of the data .
         As a result, (2*proportiontocut*n) values are actually trimmed.
-    alpha : {float}
+    alpha : float
         Confidence level of the intervals.
-    axis : {integer}
+    axis : int
         Axis along which to cut. If None, uses a flattened version of the input.
+    
     """
     data = masked_array(data, copy=False)
     trimmed = trim_both(data, proportiontocut=proportiontocut, axis=axis)
@@ -196,13 +207,15 @@
     """Returns the Maritz-Jarrett estimators of the standard error of selected
 experimental quantiles of the data.
 
-*Parameters* :
-    data: {ndarray}
+Parameters
+-----------
+    data: ndarray
         Data array.
-    prob: {sequence}
+    prob: sequence
         Sequence of quantiles to compute.
-    axis : {integer}
+    axis : int
         Axis along which to compute the quantiles. If None, use a flattened array.
+    
     """
     def _mjci_1D(data, p):
         data = data.compressed()
@@ -236,14 +249,15 @@
     """Computes the alpha confidence interval for the selected quantiles of the
 data, with Maritz-Jarrett estimators.
 
-*Parameters* :
-    data: {ndarray}
+Parameters
+----------
+    data: ndarray
         Data array.
-    prob: {sequence}
+    prob: sequence
         Sequence of quantiles to compute.
-    alpha : {float}
+    alpha : float
         Confidence level of the intervals.
-    axis : {integer}
+    axis : integer
         Axis along which to compute the quantiles. If None, use a flattened array.
     """
     alpha = min(alpha, 1-alpha)
@@ -258,13 +272,14 @@
     """Computes the alpha-level confidence interval for the median of the data,
 following the Hettmasperger-Sheather method.
 
-*Parameters* :
-    data : {sequence}
+Parameters
+----------
+    data : sequence
         Input data. Masked values are discarded. The input should be 1D only, or
         axis should be set to None.
-    alpha : {float}
+    alpha : float
         Confidence level of the intervals.
-    axis : {integer}
+    axis : integer
         Axis along which to compute the quantiles. If None, use a flattened array.
     """
     def _cihs_1D(data, alpha):
@@ -299,7 +314,8 @@
 The comparison is performed using the McKean-Schrader estimate of the standard
 error of the medians.
 
-*Parameters* :
+Parameters
+----------
     group_1 : {sequence}
         First dataset.
     group_2 : {sequence}
@@ -307,7 +323,8 @@
     axis : {integer}
         Axis along which the medians are estimated. If None, the arrays are flattened.
 
-*Returns* :
+Returns
+-------
     A (p,) array of comparison values.
 
     """
@@ -325,22 +342,23 @@
 #..............................................................................
 def rank_data(data, axis=None, use_missing=False):
     """Returns the rank (also known as order statistics) of each data point
-along the given axis.
+    along the given axis.
 
-If some values are tied, their rank is averaged.
-If some values are masked, their rank is set to 0 if use_missing is False, or
-set to the average rank of the unmasked values if use_missing is True.
+    If some values are tied, their rank is averaged.
+    If some values are masked, their rank is set to 0 if use_missing is False, 
+    or set to the average rank of the unmasked values if use_missing is True.
 
-*Parameters* :
-    data : {sequence}
-        Input data. The data is transformed to a masked array
-    axis : {integer}
-        Axis along which to perform the ranking. If None, the array is first
-        flattened. An exception is raised if the axis is specified for arrays
-        with a dimension larger than 2
-    use_missing : {boolean}
-        Whether the masked values have a rank of 0 (False) or equal to the
-        average rank of the unmasked values (True).
+    Parameters
+    ----------
+        data : sequence
+            Input data. The data is transformed to a masked array
+        axis : integer
+            Axis along which to perform the ranking. 
+            If None, the array is first flattened. An exception is raised if 
+            the axis is specified for arrays with a dimension larger than 2
+        use_missing : boolean
+            Whether the masked values have a rank of 0 (False) or equal to the
+            average rank of the unmasked values (True).
     """
     #
     def _rank1d(data, use_missing=False):

Modified: branches/maskedarray/numpy/ma/mstats.py
===================================================================
--- branches/maskedarray/numpy/ma/mstats.py	2008-02-14 01:32:20 UTC (rev 4803)
+++ branches/maskedarray/numpy/ma/mstats.py	2008-02-14 01:32:35 UTC (rev 4804)
@@ -33,16 +33,20 @@
 
 def winsorize(data, alpha=0.2):
     """Returns a Winsorized version of the input array.
+    
+    The (alpha/2.) lowest values are set to the (alpha/2.)th percentile, 
+    and the (alpha/2.) highest values are set to the (1-alpha/2.)th 
+    percentile.
+    Masked values are skipped.
 
-The (alpha/2.) lowest values are set to the (alpha/2.)th percentile, and
-the (alpha/2.) highest values are set to the (1-alpha/2.)th percentile
-Masked values are skipped.
+    Parameters
+    ----------
+        data : ndarray
+            Input data to Winsorize. The data is first flattened.
+        alpha : float
+            Percentage of total Winsorization: alpha/2. on the left, 
+            alpha/2. on the right
 
-*Parameters*:
-    data : {ndarray}
-        Input data to Winsorize. The data is first flattened.
-    alpha : {float}, optional
-        Percentage of total Winsorization : alpha/2. on the left, alpha/2. on the right
     """
     data = masked_array(data, copy=False).ravel()
     idxsort = data.argsort()
@@ -53,18 +57,26 @@
 
 #..............................................................................
 def trim_both(data, proportiontocut=0.2, axis=None):
-    """Trims the data by masking the int(trim*n) smallest and int(trim*n) largest
-values of data along the given axis, where n is the number of unmasked values.
+    """Trims the data by masking the int(trim*n) smallest and int(trim*n) 
+    largest values of data along the given axis, where n is the number 
+    of unmasked values.
 
-*Parameters*:
-    data : {ndarray}
-        Data to trim.
-    proportiontocut : {float}
-        Percentage of trimming. If n is the number of unmasked values before trimming,
-        the number of values after trimming is (1-2*trim)*n.
-    axis : {integer}
-        Axis along which to perform the trimming. If None, the input array is first
-        flattened.
+    Parameters
+    ----------
+        data : ndarray
+            Data to trim.
+        proportiontocut : float
+            Percentage of trimming. If n is the number of unmasked values 
+            before trimming, the number of values after trimming is:
+                (1-2*trim)*n.
+        axis : int
+            Axis along which to perform the trimming. 
+            If None, the input array is first flattened.
+
+    Notes
+    -----
+        The function works only for arrays up to 2D.
+
     """
     #...................
     def _trim_1D(data, trim):
@@ -87,22 +99,30 @@
 
 #..............................................................................
 def trim_tail(data, proportiontocut=0.2, tail='left', axis=None):
-    """Trims the data by masking int(trim*n) values from ONE tail of the data
-along the given axis, where n is the number of unmasked values.
+    """Trims the data by masking int(trim*n) values from ONE tail of the 
+    data along the given axis, where n is the number of unmasked values.
 
-*Parameters*:
-    data : {ndarray}
-        Data to trim.
-    proportiontocut : {float}
-        Percentage of trimming. If n is the number of unmasked values before trimming,
-        the number of values after trimming is (1-trim)*n.
-    tail : {string}
-        Trimming direction, in ('left', 'right'). If left, the proportiontocut
-        lowest values are set to the corresponding percentile. If right, the
-        proportiontocut highest values are used instead.
-    axis : {integer}
-        Axis along which to perform the trimming. If None, the input array is first
-        flattened.
+    Parameters
+    ----------
+        data : ndarray
+            Data to trim.
+        proportiontocut : float
+            Percentage of trimming. If n is the number of unmasked values 
+            before trimming, the number of values after trimming is 
+            (1-trim)*n.
+        tail : string
+            Trimming direction, in ('left', 'right'). 
+            If left, the ``proportiontocut`` lowest values are set to the 
+            corresponding percentile. If right, the ``proportiontocut`` 
+            highest values are used instead.
+        axis : int
+            Axis along which to perform the trimming. 
+            If None, the input array is first flattened.
+
+    Notes
+    -----
+        The function works only for arrays up to 2D.
+
     """
     #...................
     def _trim_1D(data, trim, left):
@@ -138,35 +158,43 @@
 
 #..............................................................................
 def trimmed_mean(data, proportiontocut=0.2, axis=None):
-    """Returns the trimmed mean of the data along the given axis. Trimming is
-performed on both ends of the distribution.
+    """Returns the trimmed mean of the data along the given axis. 
+    Trimming is performed on both ends of the distribution.
 
-*Parameters*:
-    data : {ndarray}
-        Data to trim.
-    proportiontocut : {float}
-        Proportion of the data to cut from each side of the data .
-        As a result, (2*proportiontocut*n) values are actually trimmed.
-    axis : {integer}
-        Axis along which to perform the trimming. If None, the input array is first
-        flattened.
+    Parameters
+    ----------
+        data : ndarray
+            Data to trim.
+        proportiontocut : float
+            Proportion of the data to cut from each side of the data .
+            As a result, (2*proportiontocut*n) values are actually trimmed.
+        axis : int
+            Axis along which to perform the trimming. 
+            If None, the input array is first flattened.
+
     """
     return trim_both(data, proportiontocut=proportiontocut, axis=axis).mean(axis=axis)
 
 #..............................................................................
 def trimmed_stde(data, proportiontocut=0.2, axis=None):
     """Returns the standard error of the trimmed mean for the input data,
-along the given axis. Trimming is performed on both ends of the distribution.
+    along the given axis. Trimming is performed on both ends of the distribution.
 
-*Parameters*:
-    data : {ndarray}
-        Data to trim.
-    proportiontocut : {float}
-        Proportion of the data to cut from each side of the data .
-        As a result, (2*proportiontocut*n) values are actually trimmed.
-    axis : {integer}
-        Axis along which to perform the trimming. If None, the input array is first
-        flattened.
+    Parameters
+    ----------
+        data : ndarray
+            Data to trim.
+        proportiontocut : float
+            Proportion of the data to cut from each side of the data .
+            As a result, (2*proportiontocut*n) values are actually trimmed.
+        axis : int
+            Axis along which to perform the trimming. 
+            If None, the input array is first flattened.
+
+    Notes
+    -----
+        The function worrks with arrays up to 2D.
+
     """
     #........................
     def _trimmed_stde_1D(data, trim=0.2):
@@ -189,13 +217,14 @@
     """Returns the McKean-Schrader estimate of the standard error of the sample
 median along the given axis.
 
+    Parameters
+    ----------
+        data : ndarray
+            Data to trim.
+        axis : int
+            Axis along which to perform the trimming. 
+            If None, the input array is first flattened.
 
-*Parameters*:
-    data : {ndarray}
-        Data to trim.
-    axis : {integer}
-        Axis along which to perform the trimming. If None, the input array is first
-        flattened.
     """
     def _stdemed_1D(data):
         sorted = numpy.sort(data.compressed())
@@ -240,16 +269,17 @@
     - (.4,.4)  : approximately quantile unbiased (Cunnane)
     - (.35,.35): APL, used with PWM
 
-*Parameters*:
-    x : {sequence}
+Parameters
+----------
+    x : sequence
         Input data, as a sequence or array of dimension at most 2.
-    prob : {sequence}
+    prob : sequence
         List of quantiles to compute.
-    alpha : {float}
+    alpha : float
         Plotting positions parameter.
-    beta : {float}
+    beta : float
         Plotting positions parameter.
-    axis : {integer}
+    axis : int
         Axis along which to perform the trimming. If None, the input array is first
         flattened.
     """
@@ -299,6 +329,18 @@
           if x is normally distributed (R type 9)
         - (.4,.4)  : approximately quantile unbiased (Cunnane)
         - (.35,.35): APL, used with PWM
+
+Parameters
+----------
+    x : sequence
+        Input data, as a sequence or array of dimension at most 2.
+    prob : sequence
+        List of quantiles to compute.
+    alpha : float
+        Plotting positions parameter.
+    beta : float
+        Plotting positions parameter.
+
     """
     data = masked_array(data, copy=False).reshape(1,-1)
     n = data.count()
@@ -311,7 +353,11 @@
 
 
 def mmedian(data, axis=None):
-    """Returns the median of data along the given axis. Missing data are discarded."""
+    """Returns the median of data along the given axis. 
+
+    Missing data are discarded.
+
+    """
     def _median1D(data):
         x = numpy.sort(data.compressed())
         if x.size == 0:
@@ -331,17 +377,18 @@
 Normalization is by (N-1) where N is the number of observations (unbiased
 estimate).  If bias is True then normalization is by N.
 
-*Parameters*:
-    x : {ndarray}
+Parameters
+----------
+    x : ndarray
         Input data. If x is a 1D array, returns the variance. If x is a 2D array,
         returns the covariance matrix.
-    y : {ndarray}, optional
+    y : ndarray
         Optional set of variables.
-    rowvar : {boolean}
+    rowvar : boolean
         If rowvar is true, then each row is a variable with obersvations in columns.
         If rowvar is False, each column is a variable and the observations are in
         the rows.
-    bias : {boolean}
+    bias : boolean
         Whether to use a biased or unbiased estimate of the covariance.
         If bias is True, then the normalization is by N, the number of observations.
         Otherwise, the normalization is by (N-1)
@@ -400,10 +447,10 @@
     """Evalutates Rosenblatt's shifted histogram estimators for each point
 on the dataset 'data'.
 
-*Parameters* :
-    data : {sequence}
+Parameters
+    data : sequence
         Input data. Masked values are ignored.
-    points : {sequence}
+    points : sequence
         Sequence of points where to evaluate Rosenblatt shifted histogram.
         If None, use the data.
     """