[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

11 Aug 2023

      This has come up before, see https://github.com/numpy/numpy/issues/6044 for
the first time this came up; there were several subsequent discussions
linked there.

In the meantime, the data APIs consortium has been actively working on
adding a `cumulative_sum` function to the array API standard, see
https://github.com/data-apis/array-api/issues/597 and
https://github.com/data-apis/array-api/pull/653. The proposed
`cumulative_sum` function includes an `include_initial` keyword argument
that gets the OP's desired behavior.

I think we should probably eventually deprecate `cumsum` and `cumprod` in
favor of the array API standard's `cumulative_sum` and `cumulative_product`
if only because of the embarrassing naming issue. Once the array API
standard has finalized the name for the keyword argument, I think it makes
sense to add the keyword argument to np.cumsum, even if we don't deprecate
it yet. I don't think it makes sense to add a new function just for this.

On Fri, Aug 11, 2023 at 6:34 AM <john.dawson@camlingroup.com> wrote:
...
`cumsum` computes the sum of the first k summands for every k from 1.
Judging by my experience, it is more often useful to compute the sum of the
first k summands for every k from 0, as `cumsum`'s behaviour leads to
fencepost-like problems.
https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error
For example, `cumsum` is not the inverse of `diff`. I propose adding a
function to NumPy to compute cumulative sums beginning with 0, that is, an
inverse of `diff`. It might be called `cumsum0`. The following code is
probably not the best way to implement it, but it illustrates the desired
behaviour.
```
def cumsum0(a, axis=None, dtype=None, out=None):
    """
    Return the cumulative sum of the elements along a given axis,
    beginning with 0.
cumsum0 does the same as cumsum except that cumsum computes the sum
    of the first k summands for every k from 1 and cumsum, from 0.
Parameters
    ----------
    a : array_like
        Input array.
    axis : int, optional
        Axis along which the cumulative sum is computed. The default
        (None) is to compute the cumulative sum over the flattened
        array.
    dtype : dtype, optional
        Type of the returned array and of the accumulator in which the
        elements are summed. If `dtype` is not specified, it defaults to
        the dtype of `a`, unless `a` has an integer dtype with a
        precision less than that of the default platform integer. In
        that case, the default platform integer is used.
    out : ndarray, optional
        Alternative output array in which to place the result. It must
        have the same shape and buffer length as the expected output but
        the type will be cast if necessary. See
        :ref:`ufuncs-output-type` for more details.
Returns
    -------
    cumsum0_along_axis : ndarray.
        A new array holding the result is returned unless `out` is
        specified, in which case a reference to `out` is returned. If
        `axis` is not None the result has the same shape as `a` except
        along `axis`, where the dimension is smaller by 1.
See Also
    --------
    cumsum : Cumulatively sum array elements, beginning with the first.
    sum : Sum array elements.
    trapz : Integration of array values using the composite trapezoidal
rule.
    diff : Calculate the n-th discrete difference along given axis.
Notes
    -----
    Arithmetic is modular when using integer types, and no error is
    raised on overflow.
``cumsum0(a)[-1]`` may not be equal to ``sum(a)`` for floating-point
    values since ``sum`` may use a pairwise summation routine, reducing
    the roundoff-error. See `sum` for more information.
Examples
    --------
    >>> a = np.array([[1, 2, 3], [4, 5, 6]])
    >>> a
    array([[1, 2, 3],
           [4, 5, 6]])
    >>> np.cumsum0(a)
    array([ 0,  1,  3,  6, 10, 15, 21])
    >>> np.cumsum0(a, dtype=float)  # specifies type of output value(s)
    array([ 0.,  1.,  3.,  6., 10., 15., 21.])
>>> np.cumsum0(a, axis=0)  # sum over rows for each of the 3 columns
    array([[0, 0, 0],
           [1, 2, 3],
           [5, 7, 9]])
    >>> np.cumsum0(a, axis=1)  # sum over columns for each of the 2 rows
    array([[ 0,  1,  3,  6],
           [ 0,  4,  9, 15]])
``cumsum(b)[-1]`` may not be equal to ``sum(b)``
>>> b = np.array([1, 2e-9, 3e-9] * 1000000)
    >>> np.cumsum0(b)[-1]
    1000000.0050045159
    >>> b.sum()
    1000000.0050000029
"""
    empty = a.take([], axis=axis)
    zero = empty.sum(axis, dtype=dtype, keepdims=True)
    later_cumsum = a.cumsum(axis, dtype=dtype)
    return concatenate([zero, later_cumsum], axis=axis, dtype=dtype,
out=out)
```
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-leave@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: nathan12343@gmail.com

[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

Nathan