[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

Aug. 22, 2023

      I don’t have an issue with cumsum0 if it is approached as a request for a useful utility function.

But arguing that this is what a cumulative sum function should be doing is a very big stretch. Cumulative sum has its foundational meaning and purpose which is clearly reflected in its name, which is not to solve fencepost error, but to accumulate the summation sequence. Prepending 0 as part of it feels very unnatural. It is simply extra operation.

diff0, in my opinion, has a bit more intuitive sense to it, but obviously there is no need to add it if no one else needs/uses it.
...
On 22 Aug 2023, at 17:36, john.dawson@camlingroup.com wrote:
Dom Grigonis wrote:
...
1. Dimension length stays constant, while cumusm0 extends length to n+1, then np.diff, truncates it back. This adds extra complexity, while things are very convenient to work with when dimension length stays constant throughout the code.
For n values there are n-1 differences. Equivalently, for k differences there are k+1 values. Herefor, `diff` ought to reduce length by 1 and `cumsum` ought to increase it by 1. Returning arrays of the same length is a fencepost error. This is a problem in the current behaviour of `cumsum` and the proposed behaviour of `diff0`.
diff0 doesn’t solve the error in a strict sense. However, the first value of diff0 result becomes the starting point from which to count remaining differences, so with the right approach it does solve the issue - if starting values are subtracted then it is doing the same thing, just in different order. See below:
...
------------------------------------------------------------
EXAMPLE
Consider a path given by a list of points, say (101, 203), (102, 205), (107, 204) and (109, 202). What are the positions at fractions, say 1/3 and 2/3, along the path (linearly interpolating)?
The problem is naturally solved with `diff` and `cumsum0`:
```
import numpy as np
from scipy import interpolate
positions = np.array([[101, 203], [102, 205], [107, 204], [109, 202]], dtype=float)
steps_2d = np.diff(positions, axis=0)
steps_1d = np.linalg.norm(steps_2d, axis=1)
distances = np.cumsum0(steps_1d)
fractions = distances / distances[-1]
interpolate_at = interpolate.make_interp_spline(fractions, positions, 1)
interpolate_at(1/3)
interpolate_at(2/3)
```
Please show how to solve the problem with `diff0` and `cumsum`.
------------------------------------------------------------
...
------------------------------------------------------------
EXAMPLE
Money is invested on 2023-01-01. The annualized rate is 4% until 2023-02-04 and 5% thence until 2023-04-02. By how much does the money multiply in this time?
The problem is naturally solved with `diff`:
```
import numpy as np
percents = np.array([4, 5], dtype=float)
times = np.array(["2023-01-01", "2023-02-04", "2023-04-02"], dtype=np.datetime64)
durations = np.diff(times)
YEAR = np.timedelta64(365, "D")
multipliers = (1 + percents / 100) ** (durations / YEAR)
multipliers.prod()
```
Please show how to solve the problem with `diff0`. It makes sense to divide `np.diff(times)` by `YEAR`, but it would not make sense to divide the output of `np.diff0(times)` by `YEAR` because of its incongruous initial value.
------------------------------------------------------------
In my experience it is more sensible to use time series approach, where the whole path of investment is calculated. For modelling purposes, analysis and presentation to clients single code can then be used. I would do it like:
r = np.log(1 + np.array([0, 0.04, 0.05]))
start_date = np.array("2023-01-01", dtype=np.datetime64)
times = np.array(["2023-01-01", "2023-02-04", "2023-04-02"], dtype=np.datetime64)
t = (times - start_date).astype(float) / 365
positions = np.array([[101, 203], [102, 205], [107, 204], [109, 202]], dtype=float)
positions_rel = positions - positions[0, None]
steps_2d = diff0(positions_rel, axis=0)
steps_1d = np.linalg.norm(steps_2d, axis=1)
distances = np.cumsum(steps_1d)
fractions = distances / distances[-1]
interpolate_at = interpolate.make_interp_spline(fractions, positions, 1)
print(interpolate_at(1/3))
print(interpolate_at(2/3))
dt = diff0(t)
normalised = np.exp(np.cumsum(r * dt))
# PLOT
s0 = 1000
plt.plot(s0 * normalised)

Apart from responses above, diff0 is useful in data analysis. Indices and observations usually have the same length. It is always convenient to keep it that way and it makes a nice, clean and simple code.
t = dates
s = observations
# Plot changes:
ds = diff0(s)
plt.plot(dates, ds)
# 2nd order changes
plt.plot(dates, diff0(ds))
# Moving average of changes
plt.plot(dates, bottleneck.move_mean(ds, 3))
...
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-leave@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: dom.grigonis@gmail.com