I recently published a simple Julia package
to demonstrate a natural extension of automatic differentiation (AD) to stochastic processes.
Writing the code was actually the easy part, but I struggled to come up with a satisfying explanation for how it works. In the process, I came up with an interpretation I’m happy with that applies to both standard AD as well stochastic AD and, in fact, extends to other differential graded algebras and beyond.
Eliott highlights the importance of AD being “compositional”. That is, the derivative of the product of two objects should be expressible in terms of those two objects.
To quote from his article:
Strictly speaking, Theorem 1 is not a compositional recipe for differentiating sequential compositions, i.e., it is not the case can be constructed solely from and .
In this article, inspired by Elliot, I’ll introduce an interpretation of AD from the perspective of differential graded algebras and point toward avenues for further work.
For a brief review and to establish some notation, consider a graded algebra
where is a commutative algebra and for are -modules.
A derivative on a graded algebra is a nilpotent map (of degree +1 or -1) satisfying the product rule
for any and , where is the degree of .
A differential graded algebra is a graded algebra together with a derivative .
In this article, we consider differential graded algebras over some underlying space in which case is the algebra of functions with
Motivated by Elliot’s concept of composability, we introduce a map defined by
and let denote its image, i.e.
a product defined by
and derivative defined by
so that is also a graded differential algebra. In fact, when restricted to , is the inverse of so and are effectively equivalent.
The object encodes the information required to make the above operations composable in the sense that
can be expressed entirely in terms of and .
Motivated again by composability, consider two differential graded algebras and over spaces and together with a map inducing a pullback map
and we want to compute
Composability means we should be able to compute this in terms of and some, yet to be determined, .
where is a linear differential map and
This motivates, in complete analogy, the definition
is expressible entirely in terms of and .
Composition of maps is defined by
Combining with its differential map is exactly what is required to maintain composability in the same manner that must be combined with to maintain composability.
Consider a neighborhood of a smooth -dimensional space admitting coordinate functions and a coordinate map given by
A smooth function can be parameterized in terms of the coordinates (within ) via
with differential given by (summation implied)
Although encodes what we need for composability, in order to implement this in code, we need to express in coordinates. To do so, write
is a row vector representing the components of in coordinates .
Similarly, consider a neighborhood of a smooth -dimensional space with coordinate functions , a coordinate map given by
and a smooth map parameterized (within the neighborhood ) via
is the Jacobian matrix.
Given a third map with
Comparing this to
as it should.
Summary and Future Work
This article is basically the result of me trying to find a satisfactory way to explain the almost magical results demostrated in StochasticDiff.jl. As reviewed in a few articles on this blog, e.g. here, stochastic calculus can be formulated as a noncommutative differential algebra so that was a natural framework for me to view the results, but it not really necessary. The main lesson here that was clearly and beautifully enunciated by Conal Elliott was the importance of composability in computation.
By enlarging to via which combines objects and their differentials, we demonstrated that the basic operations of differential graded algebras are composable and hence computable in a computer program.
For example, we now have
which are expressible entirely in terms of , and .
In terms of category theory, there are commuting diagrams everywhere here, which indicates these results can be extended to large swaths of differential geometry and topology opening the door to virtually unlimited possibilities for future work. You can tell my interests by scrolling through my articles here so those are likely targets.
SochasticDiff.jl was just a proof of concept, but a compelling one. I will probably clean it up and generalize it a bit and try to create some examples with change of coordinates etc.
This article is a bit mixed. Some parts discuss higher order forms, but the pullback map stopped with forms of degree 0 and 1 since that is what I need to start writing code. The path to higher order forms (via tensor product) is clear though, so that would be a nice topic for further work.
Another interesting avenue would be to consider vector fields and the interior product. I don’t see any major stumbling blocks to working that out. That would then lead to mixed covariant and contravariant tensors, Lie derivatives, Dirac equations, etc. all formulated in a composable / computable manner allowing for some interesting computer programs.
*References (or Lack Thereof)
For me, this was a passion project and I had a lot of fun working all this out on my own. I worked through nearly an entire notebook of scribbled hieroglyphic-like notes. My only reference (other than Wikipedia to remind me of some basic definitions) was Conal Elliot’s video and his paper, so please refer to that paper and references therein. Also, please let me know if there are any additional references that should be given credit here for prior work and I am happy to add them.