I recently published a simple Julia package

to demonstrate a natural extension of automatic differentiation (AD) to stochastic processes.

Writing the code was actually the easy part, but I struggled to come up with a satisfying explanation for how it works. In the process, I came up with an interpretation I’m happy with that applies to both standard AD as well stochastic AD and, in fact, extends to other differential graded algebras and beyond.

The idea was inspired by a beautiful Microsoft Research talk given by Conal Elliott

Eliott highlights the importance of AD being “compositional”. That is, the derivative of the product of two objects should be expressible in terms of those two objects.

To quote from his article:

Strictly speaking, Theorem 1 is not a compositional recipe for differentiating sequential compositions, i.e., it is not the case can be constructed solely from and .

In this article, inspired by Elliot, I’ll introduce an interpretation of AD from the perspective of differential graded algebras and point toward avenues for further work.

## Review

For a brief review and to establish some notation, consider a graded algebra

where is a commutative algebra and for are -modules.

A derivative on a graded algebra is a nilpotent map (of degree +1 or -1) satisfying the product rule

for any and , where is the degree of .

A differential graded algebra is a graded algebra together with a derivative .

In this article, we consider differential graded algebras over some underlying space in which case is the algebra of functions with

and

## Composability Revisited

Motivated by Elliot’s concept of composability, we introduce a map defined by

and let denote its image, i.e.

with projections

and

a product defined by

and derivative defined by

so that is also a graded differential algebra. In fact, when restricted to , is the inverse of so and are effectively equivalent.

The object encodes the information required to make the above operations composable in the sense that

and

can be expressed entirely in terms of and .

## Maps

Motivated again by composability, consider two differential graded algebras and over spaces and together with a map inducing a pullback map

defined by

for all

and we want to compute

Composability means we should be able to compute this in terms of and some, yet to be determined, .

First expand

and define

where is a linear differential map and

so that

This motivates, in complete analogy, the definition

so that

is expressible entirely in terms of and .

Composition of maps is defined by

so that

with .

Combining with its differential map is exactly what is required to maintain composability in the same manner that must be combined with to maintain composability.

## Coordinates

Consider a neighborhood of a smooth -dimensional space admitting coordinate functions and a coordinate map given by

A smooth function can be parameterized in terms of the coordinates (within ) via

with differential given by (summation implied)

Although encodes what we need for composability, in order to implement this in code, we need to express in coordinates. To do so, write

where

is a row vector representing the components of in coordinates .

Similarly, consider a neighborhood of a smooth -dimensional space with coordinate functions , a coordinate map given by

and a smooth map parameterized (within the neighborhood ) via

where

is the Jacobian matrix.

Given a third map with

we have

Comparing this to

confirms that

as it should.

## Summary and Future Work

This article is basically the result of me trying to find a satisfactory way to explain the almost magical results demostrated in StochasticDiff.jl. As reviewed in a few articles on this blog, e.g. here, stochastic calculus can be formulated as a noncommutative differential algebra so that was a natural framework for me to view the results, but it not really necessary. The main lesson here that was clearly and beautifully enunciated by Conal Elliott was the importance of composability in computation.

By enlarging to via which combines objects and their differentials, we demonstrated that the basic operations of differential graded algebras are composable and hence computable in a computer program.

For example, we now have

and

which are expressible entirely in terms of , and .

In terms of category theory, there are commuting diagrams everywhere here, which indicates these results can be extended to large swaths of differential geometry and topology opening the door to virtually unlimited possibilities for future work. You can tell my interests by scrolling through my articles here so those are likely targets.

SochasticDiff.jl was just a proof of concept, but a compelling one. I will probably clean it up and generalize it a bit and try to create some examples with change of coordinates etc.

This article is a bit mixed. Some parts discuss higher order forms, but the pullback map stopped with forms of degree 0 and 1 since that is what I need to start writing code. The path to higher order forms (via tensor product) is clear though, so that would be a nice topic for further work.

Another interesting avenue would be to consider vector fields and the interior product. I don’t see any major stumbling blocks to working that out. That would then lead to mixed covariant and contravariant tensors, Lie derivatives, Dirac equations, etc. all formulated in a composable / computable manner allowing for some interesting computer programs.

## *References (or Lack Thereof)

For me, this was a passion project and I had a lot of fun working all this out on my own. I worked through nearly an entire notebook of scribbled hieroglyphic-like notes. My only reference (other than Wikipedia to remind me of some basic definitions) was Conal Elliot’s video and his paper, so please refer to that paper and references therein. Also, please let me know if there are any additional references that should be given credit here for prior work and I am happy to add them.