[Numpy-discussion] padding options for diff

Marten van Kerkwijk m.h.vankerkwijk at gmail.com
Fri Oct 28 09:23:20 EDT 2016

Matthew has made what looks like a very nice implementation of padding
in np.diff in https://github.com/numpy/numpy/pull/8206. I raised two
general questions about desired behaviour there that Matthew thought
we should put out on the mailiing list as well. This indeed seemed a
good opportunity to get feedback, so herewith a copy of

-- Marten

1. I'm not sure that treating a 1-d array as something that will just
extend the result along `axis` is a good idea, as it breaks standard
broadcasting rules. E.g., consider
np.diff([[1, 2], [4, 8]], to_begin=[1, 4])
# with your PR:
array([[1, 4, 1],
       [1, 4, 4]])
# but from regular broadcasting I would expect
array([[1, 1],
       [4, 4]])
# i.e., the same as if I did to_begin=[[1, 4]]
I think it is slightly odd to break the broadcasting expectation here,
especially since the regular use case surely is just to add a single
element so that one keeps the original shape. The advantage of
assuming this is that you do not have to do *any* array shaping of
`to_begin` and `to_end` (which perhaps also suggests it is the right
thing to do).

2. As I mentioned above, I think it may be worth thinking through a
little what to do with higher order differences, at least for
`to_begin='first'`. If the goal is to ensure that with that option, it
becomes the inverse of `cumsum`, then I think for higher order one
should add multiple elements in front, i.e., for that case, the
recursive call should be
return np.diff(np.diff(a, to_begin='first'), n-1, to_begin='first')

More information about the NumPy-Discussion mailing list