[Numpy-discussion] ANN: MyGrad 2.0 - Drop-in autodiff for NumPy

Tue Apr 20 23:57:06 EDT 2021

Hi Stephan,

You are correct that MyGrad takes an object-oriented design, rather than a
functional one. This enables a more imperative style of workflow [1], which
is how many people approach doing data science in notebooks and REPLs.
MyGrad feels similar to NumPy and PyTorch in this way.

Ultimately, swapping one (or more) `ndarray` with a `Tensor`is all your need
to do differentiate your numpy-based code with respect to that variable:

    from stephans_library import func_using_numpy

    x = mygrad.tensor(1.)
    y = mygrad.tensor(2.)
    z = func_using_numpy(x, y)  # coerced into returning a Tensor
    z.backward()  # computes dz/dx and dz/dy
    x.grad  # stores dz/dx
    y.grad  # stores dz/dy

Thus with MyGrad, you can truly drop in a tensor into code that is written
in vanilla NumPy (assuming said code involves numpy functions currently
implemented in MyGrad), no matter what style of code that is – functional or
otherwise.

Regarding autograd, I would maybe describe them as "swap out" autodiff for
NumPy, rather than "drop-in". Indeed, you need to use the functions supplied
by `autograd.numpy` in order to leverage their functionality, in addition to
adopting a functional code style. Which means that
`stephans_library.func_using_numpy` can't be differentiated either unless it
used autograd.numpy.

Furthermore, autograd does not really aim to be "NumPy with autodiff" to the
same fidelity. For example, and in contrast with MyGrad, it does not
support:
  - in-place operations
  - specifying dtype, where, or out in ufuncs
  - common use-cases of einsum like traces and broadcast-reduction [2]

And, unfortunately, autograd has some long-standing bugs, including cases
where it simply gives you the wrong derivatives for relatively simple
functions [3]. In general, it doesn't seem like there is much activity
towards dealing with bug reports in the library.

That all being said, here are some pros and cons of MyGrad by my own
estimate:

Some cons of MyGrad:
  - autograd provides rich support for computing Jacobians and higher-order
derivatives. MyGrad doesn't.
  - Still plenty of NumPy functions that need implementing
  - Supporting a flexible imperative style along with in-place operations
and views comes at a (mitigable) performance cost [4]
  - Currently maintained just by me in my personal time (hard to match
Harvard/Google/Facebook!), which doesn't scale
  - Nowhere close to the level of adoption of autograd

Some pros of MyGrad:
   - Easy for NumPy users to just pick up and use (NumPy +
`Tensor.backward()`)
   - Big emphasis on correctness and completeness in terms of parity with
NumPy
   - Object-oriented approach has lots of perks
      - Easy for users to implement their own differentiable functions [5]
      - Tensor can be wrapped by, say, a xarray-Datarray for backprop
through xarray [6]
      - Tensor could wrap CuPy/Dask/sparse array instead of ndarray to bring
autodiff to them  
   - Polished docs, type hints, UX
   - High-quality test suite (leverages Hypothesis [7] extensively)

[1]
https://gist.github.com/rsokl/7c2812264ae622bbecc990fad4af3fd2#getting-a-derivative
[2]
https://gist.github.com/rsokl/7c2812264ae622bbecc990fad4af3fd2#some-derivatives-not-supported-by-autograd
[3]
https://gist.github.com/rsokl/7c2812264ae622bbecc990fad4af3fd2#some-places-where-autograd-returns-incorrect-derivatives
[4]
https://mygrad.readthedocs.io/en/latest/performance_tips.html#controlling-memory-guarding-behavior
[5] https://mygrad.readthedocs.io/en/latest/operation.html
[6] Still lots of sharp edges here:
https://gist.github.com/rsokl/7c2812264ae622bbecc990fad4af3fd2#a-crude-prototype-of-using-mygrad-w-xarray
[7] https://hypothesis.readthedocs.io/en/latest/

--
Sent from: http://numpy-discussion.10968.n7.nabble.com/