Automatic Differentiation

I recently published a simple Julia package

to demonstrate a natural extension of automatic differentiation (AD) to stochastic processes.

Writing the code was actually the easy part, but I struggled to come up with a satisfying explanation for how it works. In the process, I came up with an interpretation I’m happy with that applies to both standard AD as well stochastic AD and, in fact, extends to other differential graded algebras and beyond.

The idea was inspired by a beautiful Microsoft Research talk given by Conal Elliott

Eliott highlights the importance of AD being “compositional”. That is, the derivative of the product of two objects should be expressible in terms of those two objects.

To quote from his article:

Strictly speaking, Theorem 1 is not a compositional recipe for differentiating sequential compositions, i.e., it is not the case D (g\circ f ) can be constructed solely from D g and D f.

In this article, inspired by Elliot, I’ll introduce an interpretation of AD from the perspective of differential graded algebras and point toward avenues for further work.


For a brief review and to establish some notation, consider a graded algebra

\displaystyle \Omega = \bigoplus_{p=0}^n \Omega^p,

where \Omega^0 is a commutative algebra and \Omega^p for p>0 are \Omega^0-modules.

A derivative d on a graded algebra \Omega is a nilpotent map (of degree +1 or -1) satisfying the product rule

\displaystyle d(\alpha\beta) = (d\alpha)\beta + (-1)^{|\alpha|} \alpha(d\beta)

for any \alpha\in\Omega^p and \beta\in\Omega^q, where |\alpha|=p is the degree of \alpha.

A differential graded algebra (\Omega,d) is a graded algebra \Omega together with a derivative d.

In this article, we consider differential graded algebras (\Omega(\mathcal{M}),d) over some underlying space \mathcal{M} in which case \Omega^0(\mathcal{M}) is the algebra of functions f: \mathcal{M}\to\mathbb{R} with

(f g)(p) := f(p)g(p)


(f+g)(p) := f(p) + g(p).

Composability Revisited

Motivated by Elliot’s concept of composability, we introduce a map \pi^*:\Omega\to\Omega\times d\Omega defined by

\displaystyle \pi^*(\alpha) := (\alpha,d\alpha)

and let \Omega^* denote its image, i.e.

\displaystyle \Omega^* := \pi^*\left[\Omega\right]\subset \Omega\times d\Omega

with projections

\pi_1(\alpha,d\alpha) = \alpha and \pi_2 (\alpha,d\alpha) = d\alpha,

a product defined by

\displaystyle \pi^*(\alpha) \pi^*(\beta) := \pi^*(\alpha \beta)

and derivative defined by

\displaystyle d \pi^*(\alpha) := \pi^*(d\alpha)

so that (\Omega^*,d) is also a graded differential algebra. In fact, when restricted to \Omega^*, \pi^* is the inverse of \pi_1 so (\Omega,d) and (\Omega^*,d) are effectively equivalent.

The object \pi^*(\alpha) encodes the information required to make the above operations composable in the sense that

d\left[\pi^*(\alpha)\right] = \pi^*(d\alpha) = (d\alpha,0) = (\pi_2\pi^*(\alpha),0)


d\left[\pi^*(\alpha)\wedge\pi^*(\beta)\right] = \pi^*\left[d(\alpha\wedge\beta)\right] = \pi^*\left[\pi_2\pi^*(\alpha)\wedge\pi_1\pi^*(\beta) + (-1)^{|\alpha|} \pi_1\pi^*(\alpha)\wedge\pi_2\pi^*(\beta)\right]

can be expressed entirely in terms of \pi^*(\alpha) and \pi^*(\beta).


Motivated again by composability, consider two differential graded algebras \Omega(\mathcal{M}) and \Omega(\mathcal{N}) over spaces \mathcal{M} and \mathcal{N} together with a map F:\mathcal{M}\to\mathcal{N} inducing a pullback map

\displaystyle F^*:\Omega^0(\mathcal{N})\to\Omega^0(\mathcal{M})

defined by

F^*(f) = f\circ F\in\Omega^0(\mathcal{M}) for all f\in\Omega^0(\mathcal{N})

and we want to compute

\displaystyle \pi^*\left[F^*(f)\right] = \pi^*(f\circ F).

Composability means we should be able to compute this in terms of \pi^*(f) and some, yet to be determined, \pi^*(F).

First expand

\displaystyle \pi^*(f\circ F) = (f\circ F,d(f\circ F))

and define

\displaystyle d(f\circ F) := df\circ dF := (dF)^*(df)

where dF is a linear differential map and


so that

\displaystyle \pi^*(f\circ F) = (f\circ F,df\circ dF) = (f,df)\circ(F,dF).

This motivates, in complete analogy, the definition

\pi^*(F) := (F,dF)

so that

\displaystyle \pi^*(F^*(f)) = (\pi_1\pi^*(f)\circ\pi_1\pi^*(F),\pi_2\pi^*(f)\circ\pi_2\pi^*(F))

is expressible entirely in terms of \pi^*(f) and \pi^*(F).

Composition of maps is defined by

\pi^*(F)\circ\pi^*(G) := (F\circ G, dF\circ dG)

so that

\pi^*(F)\circ\pi^*(G) = \pi^*(F\circ G)

with d(F\circ G) = dF\circ dG.

Combining F with its differential map dF is exactly what is required to maintain composability in the same manner that \alpha must be combined with d\alpha to maintain composability.


Consider a neighborhood \mathcal{U}_{\mathcal{M}}\subset\mathcal{M} of a smooth m-dimensional space \mathcal{M} admitting m coordinate functions x^\mu: \mathcal{U}_{\mathcal{M}}\to\mathbb{R} and a coordinate map \phi: \mathcal{U}_{\mathcal{M}}\to\mathbb{R}^m given by

\phi(p) = (x^1(p),\cdots,x^m(p)).

A smooth function f: \mathcal{M}\to\mathbb{R} can be parameterized in terms of the coordinates x^\mu (within \mathcal{U}_{\mathcal{M}}) via

\displaystyle f = f \circ (\phi^{-1} \circ \phi) = f \circ \phi^{-1}(x^1, \cdots ,x^m)

with differential given by (summation implied)

\displaystyle df = \frac {\partial f}{\partial x^\mu} dx^\mu.

Although \pi^*(f) = (f,df) encodes what we need for composability, in order to implement this in code, we need to express \pi^*(f) in coordinates. To do so, write

\displaystyle \pi^*(f) = (f,df) := \left(f,\left[\frac{\partial f}{\partial\phi}\right]\right)_\phi,


\displaystyle \left[\frac{\partial f}{\partial \phi}\right]  := \begin{bmatrix}\frac{\partial f}{\partial x^1} & \cdots & \frac{\partial f}{\partial x^m}\end{bmatrix}

is a row vector representing the components \frac{\partial f}{\partial x^\mu} of df in coordinates \phi = (x^1,\cdots,x^m).

Similarly, consider a neighborhood \mathcal{U}_{\mathcal{N}}\subset\mathcal{N} of a smooth n-dimensional space \mathcal{N} with n coordinate functions y^\nu: \mathcal{U}_{\mathcal{N}}\to\mathbb{R}, a coordinate map \psi: \mathcal{U}_{\mathcal{N}}\to\mathbb{R}^n given by

\displaystyle \psi(p) = (y^1(p),\cdots,y^n(p))

and a smooth map F: \mathcal{M}\to\mathcal{N} parameterized (within the neighborhood \mathcal{U}_{\mathcal{M}}\cap F^{-1}[\mathcal{U}_{\mathcal{N}}]) via

\displaystyle \pi^*(F) = (F,dF) := \left( F, \left[\frac{\partial(\psi\circ F)}{\partial \phi}\right]  \right)_{(\psi,\phi)},


\displaystyle \left[\frac{\partial(\psi\circ F)}{\partial \phi}\right]  := \begin{bmatrix} \frac{\partial(y^1\circ F)}{\partial x^1} & \cdots & \frac{\partial(y^1\circ F)}{\partial x^m} \\ \vdots & \ddots & \vdots \\ \frac{\partial(y^n\circ F)}{\partial x^1} & \cdots & \frac{\partial(y^n\circ F)}{\partial x^m} \end{bmatrix}

is the Jacobian matrix.

Given a third map G: \mathcal{L}\to\mathcal{M} with

\displaystyle \pi^*(G) = \left(G, \left[\frac{\partial(\chi\circ G)}{\partial \phi}\right]\right)_{(\chi,\phi)}

we have

\displaystyle \pi^*(F)\circ\pi^*(G) := \left(F\circ G,\left[\frac{\partial(\chi\circ F)}{\partial \phi}\right] \left[\frac{\partial(\phi\circ G)}{\partial \psi}\right] \right)_{(\chi,\psi)}.

Comparing this to

\displaystyle \pi^*(F\circ G) = \left(F\circ G,\left[\frac{\partial(\chi\circ F\circ G)}{\partial \psi}\right]\right)_{(\chi,\psi)}

confirms that

\pi^*(F)\circ\pi^*(G) = \pi^*(F\circ G)

as it should.

Summary and Future Work

This article is basically the result of me trying to find a satisfactory way to explain the almost magical results demostrated in StochasticDiff.jl. As reviewed in a few articles on this blog, e.g. here, stochastic calculus can be formulated as a noncommutative differential algebra so that was a natural framework for me to view the results, but it not really necessary. The main lesson here that was clearly and beautifully enunciated by Conal Elliott was the importance of composability in computation.

By enlarging \Omega to \Omega^* via \pi^*:\Omega\to\Omega^* which combines objects and their differentials, we demonstrated that the basic operations of differential graded algebras are composable and hence computable in a computer program.

For example, we now have

\displaystyle \pi^*\left[d(\alpha\beta)\right] = d\left[\pi^*(\alpha)\pi^*(\beta)\right] = \left[d\pi^*(\alpha)\right]\pi^*(\beta) + (-1)^{|\alpha|} \pi^*(\alpha)\left[d\pi^*(\beta)\right]


\displaystyle \pi^*\left[F^*(f)\right] = \pi^*(f)\circ\pi^*(F)

which are expressible entirely in terms of \pi^*(\alpha), \pi^*(\beta) and \pi^*(F).

In terms of category theory, there are commuting diagrams everywhere here, which indicates these results can be extended to large swaths of differential geometry and topology opening the door to virtually unlimited possibilities for future work. You can tell my interests by scrolling through my articles here so those are likely targets.

SochasticDiff.jl was just a proof of concept, but a compelling one. I will probably clean it up and generalize it a bit and try to create some examples with change of coordinates etc.

This article is a bit mixed. Some parts discuss higher order forms, but the pullback map stopped with forms of degree 0 and 1 since that is what I need to start writing code. The path to higher order forms (via tensor product) is clear though, so that would be a nice topic for further work.

Another interesting avenue would be to consider vector fields and the interior product. I don’t see any major stumbling blocks to working that out. That would then lead to mixed covariant and contravariant tensors, Lie derivatives, Dirac equations, etc. all formulated in a composable / computable manner allowing for some interesting computer programs.

*References (or Lack Thereof)

For me, this was a passion project and I had a lot of fun working all this out on my own. I worked through nearly an entire notebook of scribbled hieroglyphic-like notes. My only reference (other than Wikipedia to remind me of some basic definitions) was Conal Elliot’s video and his paper, so please refer to that paper and references therein. Also, please let me know if there are any additional references that should be given credit here for prior work and I am happy to add them.


Discrete Black-Scholes Model

This post is part of a series

In the previous post, we found discrete geometric Brownian motion 

dS = \mu S dt + \sigma S dx

has the closed-form solution

\displaystyle S(i,j) = S(0,0) R^{\frac{j+i}{2}} L^{\frac{j-i}{2}}


R = 1+\mu\Delta t+\sigma\Delta x\quad\text{and}\quad L = 1+\mu\Delta t+\sigma\Delta x

In this post, I revisit some results presented in 

back in 2004.

Risk-Free Bond Price

The price of a risk-free bond will be modeled as

dB = r B dt

which has the closed-form solution

\boxed{B(i,j) = B(0,0) (1+r\Delta t)^j.}

Stock Price

The price of a stock will be modeled as a geometric Brownian motion

dS = \mu S dt + \sigma S dx.

However, we will be working in a risk-neutral measure meaning the drift will be the risk-free rate so that

dS = r S dt + \sigma S d\tilde x,


\displaystyle d\tilde x = dx - \frac{\mu-r}{\sigma} dt.

In what follows, we assume risk-neutral measure and write simply

dS = r S dt + \sigma S dx

with no\tilde{}so 

\displaystyle \boxed{S(i,j) = S(0,0) (1+r\Delta t+\sigma\Delta x)^{\frac{j+i}{2}} (1+r\Delta t-\sigma\Delta x)^{\frac{j-i}{2}}.}

Self Financing

A portfolio\Pican be constructed consisting of an optionV, a stockSand a risk-free bondB,i.e.

\displaystyle \Pi = \alpha V + \delta S + \beta B,

where\alpha,\delta,\betaare units held in the respective security.

The Black-Scholes model assumes the portfolio is self financing, i.e. to increase units in one security another security needs to be sold so no money flows into or out of the portfolio. A self-financing portfolio has a simple expression in discrete stochastic calculus

(d\alpha) V + (d\delta) S + (d\beta) B = 0,

i.e. the total change in value of the portfolio due to trading activity is zero.

Consequently, if\Pi is self financing, then

\boxed{d\Pi = \alpha dV + \delta dS + \beta dB.}

Hedging Strategy

The final ingredient of the Black-Scholes model is that there is a no-arbitrage hedging strategy\delta that eliminates the “risk” of the portfolio so that

d\Pi = r\Pi dt.

For the riskydxcomponent ofd\Piterm to vanish, the component of\mathbf{e}^+must equal the component of\mathbf{e}^-.

Expandingd\Piand comparing components gives the hedging strategy

\displaystyle \boxed{\delta(j) = -\alpha(j) \frac{V(i+1,j+1)-V(i-1,j+1)}{S(i+1,j+1)-S(i-1,j+1)}.}

Discrete Black-Scholes Equation

Inserting the hedging strategy above into

d\Pi = r\Pi dt = \alpha dV + \delta dS + \beta dB

and expanding results in the discrete Black-Scholes equation

\displaystyle \boxed{V(i,j) = \frac{1}{1+r\Delta}\left[V(i+1,j+1) p^+(j) + V(i-1,j+1) p^-(j)\right],}


\displaystyle p^+(j) = \frac{(1+r\Delta t) S(i,j) - S(i-1,j+1)}{S(i+1,j+1) - S(i-1,j+1)}

andp^-(j) = 1-p^+(j).

The discrete Black-Scholes equation is essentially the Cox-Ross-Rubinstein model.


Since we have the closed-form expression forS, there are several simplification that result. For instance, we have

S(i+1,j+1) - S(i-1,j+1) = 2\sigma\Delta x S(i,j)


(1+r\Delta t) S(i,j) - S(i-1,j+1) = \sigma\Delta x S(i,j)

so that the hedging strategy simplifies to

\displaystyle \boxed{\delta(j) = -\alpha(j)\frac{V(i+1,j+1) - V(i-1,j+1)}{2\sigma\Delta x S(i,j)}}

andp^+(j) = p^-(j) = 1/2so that

\displaystyle \boxed{V(i,j) = \frac{V(i+1,j+1)+V(i-1,j+1)}{2(1+r\Delta t)}.}

Discrete Stochastic Differential Equations

This post is part of a series

In this post, I begin looking at discrete stochastic differential equations (DSDEs).

Recall that in the continuum, stochastic differential equations (SDEs) are of the form

dS = \mu(t,S) dt + \sigma(t,S) dx.

When transcribing continuum models to the discrete setting, the first challenge is to determine how to write down the DSDEs. Due to the noncommutative nature of discrete stochastic calculus, 

dS = dt \mu(t,S) + dx \sigma(t,S)

with 0-forms on the right is a different model than above with 0-forms on the left. In fact, we could consider linear combinations such as

dS=\kappa\left[\mu(t,S)dt+\sigma(t,S) dx\right]+(1-\kappa)\left[dt\mu(t,S)+dx\sigma(t,S)\right].

Consider the continuum special case of geometric Brownian motion with constant coefficients

dS = \mu S dt + \sigma S dx

with known solution

\displaystyle S(t,x) = S(0,0) \exp\left[\left(\mu - \frac{\sigma^2}{2}\right) t + \sigma x\right].

With the above solution and switching to the context of noncommutative stochastic calculus, the differential expressed in left components is given by

\begin{aligned} dS&=\left(\partial_t S+\frac{1}{2}\partial_{xx}S\right)dt+\left(\partial_x S\right)dx\\&=\mu S dt+\sigma S dx\end{aligned}

as expected. In contrast, expressing the model in right-component form results in

\begin{aligned} dS&=dt\left(\partial_t S-\frac{1}{2}\partial_{xx}S\right)+dx\left(\partial_x S\right)\\&=dt S(\mu-\sigma)+dx S\sigma\end{aligned},

which although correct and equivalent to the above, the right-component form is not the way we would expect the continuum model to be transcribed. With this as our motivation, we will take the left-component

dS = \mu(t,S) dt + \sigma(t,S) dx

to be the way we transcribe continuum models.

Discrete Geometric Brownian Motion

To solve discrete geometric Brownian motion, first express it in terms of the bases\mathbf{e}^+and\mathbf{e}^-

\begin{aligned}dS &= \mu S dt + \sigma S dx\\&=\mu S\Delta t(\mathbf{e}^++\mathbf{e}^-) + \sigma S\Delta x(\mathbf{e}^+-\mathbf{e}^-)\\&=(\mu\Delta t + \sigma\Delta x) S\mathbf{e}^+ + (\mu\Delta t - \sigma \Delta x)S\mathbf{e}^-\end{aligned}.

The update expressions can be written immediately as

S(i+1,j+1) = \left[1+\mu\Delta t(j)+\sigma\Delta x(j)\right] S(i,j)


S(i-1,j+1) = \left[1+\mu\Delta t(j)-\sigma\Delta x(j)\right] S(i,j)

so that the solution at any node can be expressed as

\displaystyle \boxed{S(i,j) = S(0,0)R^{\frac{j+i}{2}} L^{\frac{j-i}{2}}}


R=\left[1+\mu\Delta t(j)+\sigma\Delta x(j)\right]


L=\left[1+\mu\Delta t(j)-\sigma\Delta x(j)\right].

The continuum solution can be written in a similar form as

S(i,j) = S(0,0) R_c^{\frac{j+i}{2}} L_c^{\frac{j-i}{2}},


\begin{aligned} R_c &= \exp\left[\left(\mu-\frac{\sigma^2}{2}\right)\Delta t + \sigma\Delta x\right]\\&= 1+\left(\mu-\frac{\sigma^2}{2}\right)\Delta t + \sigma\Delta x + \frac{1}{2}\left[\sigma^2(\Delta x)^2+O\left((\Delta x)^3\right)\right]\\&=R + O((\Delta x)^3)\end{aligned}


\begin{aligned} L_c &= \exp\left[\left(\mu-\frac{\sigma^2}{2}\right)\Delta t - \sigma\Delta x\right]\\&= 1+\left(\mu-\frac{\sigma^2}{2}\right)\Delta t - \sigma\Delta x + \frac{1}{2}\left[\sigma^2(\Delta x)^2+O\left((\Delta x)^3\right)\right]\\&=L + O((\Delta x)^3).\end{aligned}

Discrete Ornstein–Uhlenbeck Process

Consider the discrete Ornstein-Uhlenbeck process

\begin{aligned} dS &= \theta (\mu - S) dt + \sigma dx \\&=\left[\theta(\mu-S)\Delta t+\sigma\Delta x\right]\mathbf{e}^++\left[\theta(\mu-S)\Delta t-\sigma\Delta x\right]\mathbf{e}^-\end{aligned}

with update expressions

S(i+1,j+1) = S(i,j) \left[1-\theta\Delta t(j)\right]+\theta\mu\Delta t(j)+\sigma\Delta x(j)


S(i-1,j+1) = S(i,j) \left[1-\theta\Delta t(j)\right]+\theta\mu\Delta t(j)-\sigma\Delta x(j).


R_j(S(i,j)) = S(i,j)\left[1-\theta\Delta t(j)\right]+\theta\mu\Delta t(j)+\sigma\Delta x(j)


L_j(S(i,j)) = S(i,j)\left[1-\theta\Delta t(j)\right]+\theta\mu\Delta t(j)-\sigma\Delta x(j).

For consistency, we require

L_{j+1} \circ R_j = R_{j+1} \circ L_j.

Imposing this consistency condition gives constraints on the grid spacing, i.e.

\displaystyle \Delta x(j+1) = \frac{-1+\sqrt{1+4\theta\left[\Delta x(j)\right]^2}}{2\theta\Delta x(j)}

implying dynamic grid spacing.

General Strategy

Armed with the above examples, we can develop a general strategy for solving discrete stochastic differential equations defined by the general process

\begin{aligned} dS&=\mu(t,S) dt+\sigma(t,S) dx\\&=\left[\mu(t,S)\Delta t+\sigma(t,S)\Delta x\right]\mathbf{e}^+ +\left[\mu(t,S)\Delta t-\sigma(t,S)\Delta x\right]\mathbf{e}^-.\end{aligned}

Write down the update expressions as simply

\begin{aligned} S(i+1,j+1)&=R_j(S(i,j))\\&=S(i,j)+\mu\left[j,S(i,j)\right]\Delta t(j)+\sigma\left[j,S(i,j)\right]\Delta x(j)\end{aligned}


\begin{aligned} S(i-1,j+1)&=L_j(S(i,j))\\&=S(i,j)+\mu\left[j,S(i,j)\right]\Delta t(j)-\sigma\left[j,S(i,j)\right]\Delta x(j).\end{aligned}

Finally, impose the consistency condition

L_{j+1} \circ R_j = R_{j+1} \circ L_j.

Dynamic Grids

This post is part of a series

In the previous post of this series, I defined discrete stochastic calculus as the discrete calculus on a binary tree with\Delta t = (\Delta x)^2 so that the coordinate basis 1-formsdxanddt satisfy the commutative relations

  • [dx,x] = dt
  • [dt,t] = \Delta t dt
  • [dx,t] = [dt,x] = \Delta t dx.

A major assumption there was that the grid spacings\Delta tand\Delta xare constants. It turns out that to solve some discrete stochastic differential equations, we need to relax this assumption. In this post, I generalize discrete stochastic calculus to dynamic grids.

Recall the binary tree

We define the coordinatesxandt as discrete 0-forms

\displaystyle x = \sum_{(i,j)} x(i,j) \mathbf{e}^{(i,j)}\quad\text{and}\quad t = \sum_{(i,j)} t(i,j) \mathbf{e}^{(i,j)},

where we set

x(i\pm1,j+1) = x(i,j) \pm \Delta x(j)


t(i\pm1,j+1) = t(i,j) + \Delta t(j),

i.e. the grid spacings are fixed for a given time, but are allowed to change in time. Therefore,

dx = \Delta x (\mathbf{e}^+ - \mathbf{e}^-)

dt = \Delta t (\mathbf{e}^+ + \mathbf{e}^-)


\displaystyle \Delta x = \sum_{(i,j)} \Delta x(j) \mathbf{e}^{(i,j)}\quad\text{and}\quad\Delta t = \sum_{(i,j)} \Delta t(j) \mathbf{e}^{(i,j)}.

Since\Delta xand\Delta tare no longer constant 0-forms, they do not commute with\mathbf{e}^+and\mathbf{e}^-.

Straightforward calculations show the generalized coordinates satisfy the commutation relations

[\mathbf{e}^\pm,x] = \pm \Delta x\mathbf{e}^\pm\quad\text{and}\quad[\mathbf{e}^\pm,t] = \Delta t\mathbf{e}^\pm.

From there, it is a simple exercise to demonstrate that

[dx,x] = (\Delta t)^{-1} (\Delta x)^2 dt

[dx,t] = [dt,x] = \Delta t dx

[dt,t] = \Delta t dt,


\displaystyle (\Delta t)^{-1} = \sum_{(i,j)} \frac{1}{\Delta t(j)} \mathbf{e}^{(i,j)}.


\Delta t = (\Delta x)^2

results in the commutation relations

[dx,x] = dt

[dx,t] = [dt,x] = \Delta t dx

[dt,t] = \Delta t dt.

In the continuum limit\Delta t \to 0, these reduce to

[dx,x] = dt

[dx,t] = [dt,x] = 0

[dt,t] = 0

giving the noncommutative stochastic calculus.

When the grid spacing\Delta x(j)and\Delta t(j)are constants, the above reduces to the previously considered case.

In subsequents posts, I will adopt this more general definition of discrete stochastic calculus on dynamic grids.

Magic Time Step for the Heat Equation

Wave Equation

In the finite-difference method, there is a fairly remarkable property of the (1+1)-dimensional wave equation

\partial_t^2 \phi - c^2 \partial_x^2\phi = 0.

Replacing\partial_t^2and\partial_x^2with central-difference approximations, a node at\phi\left(i,j\right) may be computed as

\displaystyle \boxed{\phi(i,j+1) = \kappa\phi(i+1,j) + \kappa\phi(i-1,j) + 2(1-\kappa)\phi(i,j) - \phi(i,j-1)}

where\phi(i,j)is notation for\phi(i\Delta x,j\Delta t)and\kappa = \frac{(c\Delta t)^2}{(\Delta x)^2}.

The stencil for updating\phi(i,j+1)is illustrated below

Stencil 1

However, when

\Delta t = \frac{1}{c}\Delta x

then\kappa=1and the\phi(i,j)term drops out so the update expression reduces to

\displaystyle \boxed{\phi(i,j+1) = \phi(i+1,j) - \phi(i,j-1) + \phi(i-1,j)}

and does not depend on any physical constants.

This time step is known as the “magic time step” because, the above represents an exact solution to the (1+1)-dimensional wave equation, i.e. for given initial conditions, the wave will propagate exactly to within machine precision.

The stencil for the (1+1)-dimensional wave equation when using the magic time steps reduces to the following:

Stencil 2

This stencil is more reminiscent of a binary tree.

Heat Equation

Next consider a finite-difference approximation to the (1+1)-dimensional heat equation

\partial_t\phi - \frac{\sigma^2}{2} \partial_x^2\phi = 0

using a forward-difference approximation for\partial_tand a central-difference approximation for\partial_x^2. The resulting update equation is given by

\displaystyle \boxed{\phi(i,j+1) = \frac{\kappa}{2} \phi(i+1,j) + \frac{\kappa}{2} \phi(i-1,j) + (1-\kappa)\phi(i,j)}

where\kappa = \frac{\sigma^2\Delta t}{(\Delta x)^2}.

The stencil for this update expression is given by

Stencil 3

Similar to the wave equation, when setting

\Delta t = \frac{1}{\sigma^2} (\Delta x)^2

then\kappa=1and the\phi(i,j)term drops out so the update expression reduces to

\displaystyle \boxed{\phi(i,j+1) = \frac{1}{2}\left[\phi(i+1,j) + \phi(i-1,j)\right]}

and does not depend on any physical constants. The stencil also reduces to

Stencil 4

This stencil is reminiscent of a binary tree.

The above time step for the heat equation is also “magic”.

The reduced update expression has a closed form (Greens function) solution being the the normalized binomial coefficient, i.e. if\phi(0,0) = 1then

\displaystyle \phi(i,j) = \frac{1}{2^{j}} {j\choose i}.

Noncommutative Geometry and Navier-Stokes Equation

Noncommutative Geometry

In this section, I briefly review some elementary concepts of noncommutative geometry on commutative algebras primarily to collect some notation to be used later on. For a deeper look into the topic, I highly recommend the largely self-contained review articles:


Consider an (n+1)-dimensional differential graded algebra

\displaystyle \Omega = \bigoplus_{i=0}^n \Omega^i,\quad d:\Omega^i\to\Omega^{i+1}

over a commutative algebra of 0-forms\Omega^0with basis 1-forms d x^0, ... , d x^n\in\Omega^1. The core concept here lies in the fact 0-forms and 1-forms do not commute so any 1-form can be written equivalently in either left- or right-component forms

\displaystyle \alpha = \overleftarrow{\alpha_\mu} d x^\mu = d x^\mu \overrightarrow{\alpha_\mu}

respectively. In particular, the differential of a 0-form can be written as

\displaystyle d f = \left(\overleftarrow{\partial_\mu} f\right) d x^\mu = d x^\mu \left(\overrightarrow{\partial_\mu} f\right),

where\overleftarrow{\partial_\mu}: \Omega^0 \to \Omega^0 and\overrightarrow{\partial_\mu}: \Omega^0 \to \Omega^0 are not necessarily partial derivatives.

The basis 1-forms satisfy the commutation relations

\displaystyle [d x^\mu,x^\nu] = \overleftarrow{C^{\mu\nu}_{\lambda}} d x^\lambda = d x^\lambda\overrightarrow{C^{\mu\nu}_{\lambda}}.

In this post, I assume\overleftarrow{C^{\mu\nu}_{\lambda}}and\overrightarrow{C^{\mu\nu}_{\lambda}}are constant 0-forms so that\overleftarrow{C^{\mu\nu}_{\lambda}}= \overrightarrow{C^{\mu\nu}_{\lambda}}and write both simply asC^{\mu\nu}_{\lambda}. Note, however, that constant coefficients are not a restriction of the general formalism.

Since [d x^\mu, x^\nu] = [d x^\nu, x^\mu], it follows that

\displaystyle C^{\mu\nu}_{\lambda} = C^{\nu\mu}_{\lambda},

i.e.C^{\mu\nu}_{\lambda}is symmetric in the upper indices.

More generally,

\left[d f,g\right] = \left[d g,f\right] = C^{\kappa\lambda}_{\mu}\left(\overleftarrow{\partial_\kappa} f\right)\left(\overleftarrow{\partial_\lambda} g\right) d x^\mu = d x^\mu C^{\kappa\lambda}_{\mu}\left(\overrightarrow{\partial_\kappa} f\right)\left(\overrightarrow{\partial_\lambda} g\right)

so that the product rule results in

\begin{aligned} d(fg) &= (d f)g + f(d g) \\ &= g(d f) + f(d g) + [d f,g] \\ &= \left[ \left(\overleftarrow{\partial_\mu} f\right) g + f\left(\overleftarrow{\partial_\mu} g\right) + C^{\kappa\lambda}_{\mu} \left(\overleftarrow{\partial_\lambda} f\right)\left(\overleftarrow{\partial_\kappa} g\right) \right] d x^\mu \end{aligned}


\displaystyle \overleftarrow{\partial_\mu}\left(f g\right) = \left(\overleftarrow{\partial_\mu} f\right) g + f\left(\overleftarrow{\partial_\mu} g\right) + C^{\kappa\lambda}_{\mu} \left(\overleftarrow{\partial_\lambda} f\right)\left(\overleftarrow{\partial_\kappa} g\right) .


\displaystyle \overrightarrow{\partial_\mu}\left(f g\right) = \left(\overrightarrow{\partial_\mu} f\right) g + f\left(\overrightarrow{\partial_\mu} g\right) - C^{\kappa\lambda}_{\mu}\left(\overrightarrow{\partial_\lambda} f\right)\left(\overrightarrow{\partial_\kappa} g\right) .

Everything above this point holds for any commutative associative algebra\Omega^0with no assumption of smoothness. However, if\Omega^0 consists of smooth 0-forms on\mathbb{R}^{n+1}, the right- and left-partial derivatives can be expressed as

\displaystyle \overleftarrow{\partial_\mu} = \partial_\mu + \sum_{r=2}^{\infty} \frac{1}{r!} C^{i_1i_2}_{j_1}C^{i_3j_1}_{j_2}\dots C^{i_rj_{r-2}}_k \partial_{i_1}\dots\partial_{i_r}


\displaystyle \overrightarrow{\partial_\mu} = \partial_\mu + \sum_{r=2}^{\infty} \frac{(-1)^{r-1}}{r!} C^{i_1i_2}_{j_1}C^{i_3j_1}_{j_2}\dots C^{i_rj_{r-2}}_k \partial_{i_1}\dots\partial_{i_r}.

Stochastic Calculus

In the paper

a reformulation of stochastic calculus in terms of noncommutative geometry was presented, where\Omega^0consists of smooth 0-forms on\mathbb{R}^2. In this (1+1)-dimensional case, we setx^0 = tandx^1 = x with commutation relations

\displaystyle [dx,t] = [dt,x] = [dt,t] = 0and\displaystyle [dx,x] = \sigma^2 dt.

In other words

C^{xx}_t = \sigma^2

and all other coefficients vanish. In this case, we have

\overleftarrow{\partial_x} = \overrightarrow{\partial_x} = \partial_x,

where\partial_xis the partial derivative with respect tox,

\overleftarrow{\partial_t} = \partial_t + \frac{\sigma^2}{2} \partial^2_x,

where\partial_t is the partial derivative with respect tot

so that

\begin{aligned} df &= \left(\partial_t + \frac{\sigma^2}{2}\partial^2_x f\right) dt + \left(\partial_x f\right) dx \\ &= dt\left(\partial_t + \frac{\sigma^2}{2}\partial^2_x f\right) + dx\left(\partial_x f\right) - [dx,(\partial_x f)] \\ &= dt\left(\partial_t + \frac{\sigma^2}{2}\partial^2_x f\right) + dx\left(\partial_x f\right) - \sigma^2 dt\left(\partial^2_x f\right) \\ &= dt\left(\partial_t - \frac{\sigma}{2}\partial^2_x f\right) + dx\left(\partial_x f\right). \end{aligned}

and we see that

\overrightarrow{\partial_t} = \partial_t - \frac{\sigma^2}{2} \partial^2_x.

The Ito formula of stochastic calculus emerges naturally as left-components of the differential dfand the heat equation emerges whendf is expressed in right components.

Burgers Equation

In the paper

Burgers (1+1)-dimensional equation was derived from noncommutative geometry as a zero-curvature condition on\mathbb{R}^2. In the post

I corrected a minor sign error in the above article and indicated a relation to the Navier-Stokes equation. A simplified derivation is reproduced here.

In this section, let

\displaystyle [dx,t] = [dt,x] = [dt,t] = 0and\displaystyle [dx,x] = \eta dt.


A = dx \left(k u\right)

denote a connection 1-form, wherekis a constant anduis a 1-form so that

\begin{aligned} dA &= -dx \left(kdu\right) \\ &=  dt dx \left[k \left(\partial_t u - \frac{\eta}{2} \partial^2_x u\right)\right] \end{aligned}


\begin{aligned} AA &= k^2 dx\left(u dx\right) u \\ &=  k^2 dx\left(dx u - [dx, u]\right) u \\ &= dt dx \left[k^2\eta\left(\partial_x u\right) u\right]. \end{aligned}

Putting the above together and settingk = \frac{1}{\eta}results in the curvature 2-form

F = dA + AA = dt dx \frac{1}{\eta} \left[ \partial_t u - \frac{\eta}{2} \partial^2_x u + \left(\partial_x u\right) u. \right]

If the curvature 2-form vanishes, we have

\boxed{\partial_t u - \frac{\eta}{2} \partial^2_x u + \left(\partial_x u\right) u = 0}

which is the (1+1)-dimensional Burger’s equation.

Furthermore, if the curvature 2-form vanishes, it implies we can write the connection 1-form as a pure gauge

A = \phi^{-1} d\phi = -\left(d\phi^{-1}\right) \phi = -\left(d\psi\right)\psi^{-1},

where\psi = \phi^{-1}. Expanding the differential results in

A = -\left[dt \left(\partial_t\psi - \frac{\eta}{2}\partial^2_x\psi\right) + dx \left(\partial_x\psi\right)\right] \psi^{-1}


\boxed{u = -\eta\left(\partial_x\psi\right)\psi^{-1}\quad\text{and}\quad \partial_t \psi - \frac{\eta}{2}\partial^2_x\psi = 0.}

The above relations are referred to as the Cole-Hopf transformation.

This was trivially extended to (3+1)-dimensions in the post

Remark: I personally find the fact Burgers equation pops out like this and the Cole-Hopf transformation is almost a tautology mind boggling.

Navier-Stokes Equation

In this section, consider a (3+1)-dimensional noncommutative geometry withx^0 = tand implied sums run over only the spatial dimensionsx^\mu, \mu = 1,2,3and

\displaystyle [dx^\mu,t] = [dt,x^\mu] = [dt,t] = 0and\displaystyle [dx^\mu,x^\mu] = \eta g^{\mu\nu} dt

so that

df = dt \left(\partial_t f - \frac{\eta}{2} \nabla^2 f\right) + dx^\mu \left(\partial_\mu f\right)


\nabla^2 f = g^{\mu\nu} \partial_\mu\partial_\nu.

In analogy with Burgers equation, consider a 1-form connection

A = \frac{1}{\eta} \left(-dt p + dx^\mu u_\mu\right)

where\etais constant andp,u_\mu\in\Omega^0so that

dA = dt dx^\mu \frac{1}{\eta}\left(\partial_t u_\mu - \frac{\eta}{2}\nabla^2 u_\mu + \partial_\mu p\right) + dx^\mu dx^\nu \frac{1}{\eta}\left(\partial_\mu u_\nu\right)


AA = dt dx^\mu \frac{1}{\eta^2}\left[\eta\left(u\cdot\nabla\right)u_\mu\right]


\left(u\cdot\nabla\right) = g^{\kappa\lambda} u_\kappa \partial_\lambda.

Putting the above together results in the curvature 2-form

\begin{aligned} F &= dA + AA \\ &= dt x^\mu \frac{1}{\eta} \left[\partial_t u_\mu + \left(u\cdot\nabla\right)u_\mu - \frac{\eta}{2}\nabla^2 u_\mu + \partial_\mu p\right] + dx^\mu dx^\nu \frac{1}{\eta} \left[\frac{1}{2}\left(\partial_\mu u_\nu - \partial_\nu u_\mu\right)\right]. \end{aligned}

Unlike Burgers equation, the curvature 2-form does not always vanish so we can express it in the suggestive form

F = \frac{1}{\eta} \left(dt E + B\right)

whereE\in\Omega^1andB\in\Omega^2so that

\boxed{\partial_t u_\mu + \left(u\cdot\nabla\right)u_\mu - \frac{\eta}{2}\nabla^2 u_\mu + \partial_\mu p = E_\mu}


\boxed{\frac{1}{2}\left(\partial_\mu u_\nu - \partial_\nu u_\mu\right) = B_{\mu\nu}.}

The above expressions form the Navier-Stokes equations for a Newtonian fluid, whereEis an external force andBis the vorticity.

In the special case of no external force (E=0) and no vorticity (B = 0), the curvature 2-form vanishes again implying the connection 1-form may be expressed as a pure guage

A = \frac{1}{\eta} \left(-dt p + dx^\mu u_m\right) = -\left(d\phi\right) \phi^{-1}

so that

\boxed{u_\mu = -\eta\left(\partial_\mu\phi\right)\phi^{-1}\quad\text{and}\quad \partial_t\phi - \frac{\eta}{2}\nabla^2\phi - \frac{1}{\eta}p\phi = 0}

which is a Cole-Hopf-like transformation for the vorticity-free Navier-Stokes equation.

Weighted Likelihood for Time-Varying Gaussian Parameter Estimation

In a previous article, we presented a weighted likelihood technique for estimating parameters \theta of a probability density function \rho(x|\theta). The motivation being that for time series, we may wish to weigh more recent data more heavily. In this article, we will apply the technique to a simple Gaussian density

\rho(x|\mu,\nu) = \frac{1}{\sqrt{\pi\nu}} \exp\left[-\frac{(x-\mu)^2}{\nu}\right].

In this case, the log likelihood is given by

\begin{aligned} \log\mathcal{L}(\mu,\nu) &= \sum_{i=1}^N w_i \log\rho(x_i|\mu,\nu) \\ &= -\frac{1}{2} \log\pi\nu - \frac{1}{\nu} \sum_{i=1}^N w_i \left(x_i - \mu\right)^2 \end{aligned}.

Recall that the maximum likelihood occurs when

\begin{aligned} \frac{\partial}{\partial\mu} \log\mathcal{L}(\mu,\nu) = \frac{\partial}{\partial\nu} \log\mathcal{L}(\mu,\nu) = 0. \end{aligned}

A simple calculation demonstrates that this occurs when

\begin{aligned} \mu = \sum_{i=1}^N w_i x_i \end{aligned}


\begin{aligned} \sigma^2 = \sum_{i=1}^N w_i \left(x_i - \mu\right)^2, \end{aligned}

where \sigma^2 = \nu/2.

Introducing a weighted expectation operator for a random variable X with samples x_i given by

\begin{aligned} E_w(X) = \sum_{i=1}^N w_i x_i, \end{aligned}

the Gaussian parameters may be expressed in a familiar form via

\mu = E_w(X)


\sigma^2 = E_w(X^2) - \left[E_w(X)\right]^2.

This simple result justifies the use of weighted expectations for time varying Gaussian parameter estimation. As we will see, this is also useful for coding financial time series analysis.