# Automatic Differentiation

I recently published a simple Julia package

to demonstrate a natural extension of automatic differentiation (AD) to stochastic processes.

Writing the code was actually the easy part, but I struggled to come up with a satisfying explanation for how it works. In the process, I came up with an interpretation I’m happy with that applies to both standard AD as well stochastic AD and, in fact, extends to other differential graded algebras and beyond.

The idea was inspired by a beautiful Microsoft Research talk given by Conal Elliott

Eliott highlights the importance of AD being “compositional”. That is, the derivative of the product of two objects should be expressible in terms of those two objects.

To quote from his article:

Strictly speaking, Theorem 1 is not a compositional recipe for differentiating sequential compositions, i.e., it is not the case $D (g\circ f )$ can be constructed solely from $D g$ and $D f$.

In this article, inspired by Elliot, I’ll introduce an interpretation of AD from the perspective of differential graded algebras and point toward avenues for further work.

## Review

For a brief review and to establish some notation, consider a graded algebra

$\displaystyle \Omega = \bigoplus_{p=0}^n \Omega^p,$

where $\Omega^0$ is a commutative algebra and $\Omega^p$ for $p>0$ are $\Omega^0$-modules.

A derivative $d$ on a graded algebra $\Omega$ is a nilpotent map (of degree +1 or -1) satisfying the product rule

$\displaystyle d(\alpha\beta) = (d\alpha)\beta + (-1)^{|\alpha|} \alpha(d\beta)$

for any $\alpha\in\Omega^p$ and $\beta\in\Omega^q$, where $|\alpha|=p$ is the degree of $\alpha$.

A differential graded algebra $(\Omega,d)$ is a graded algebra $\Omega$ together with a derivative $d$.

In this article, we consider differential graded algebras $(\Omega(\mathcal{M}),d)$ over some underlying space $\mathcal{M}$ in which case $\Omega^0(\mathcal{M})$ is the algebra of functions $f: \mathcal{M}\to\mathbb{R}$ with

$(f g)(p) := f(p)g(p)$

and

$(f+g)(p) := f(p) + g(p).$

## Composability Revisited

Motivated by Elliot’s concept of composability, we introduce a map $\pi^*:\Omega\to\Omega\times d\Omega$ defined by

$\displaystyle \pi^*(\alpha) := (\alpha,d\alpha)$

and let $\Omega^*$ denote its image, i.e.

$\displaystyle \Omega^* := \pi^*\left[\Omega\right]\subset \Omega\times d\Omega$

with projections

$\pi_1(\alpha,d\alpha) = \alpha$ and $\pi_2 (\alpha,d\alpha) = d\alpha,$

a product defined by

$\displaystyle \pi^*(\alpha) \pi^*(\beta) := \pi^*(\alpha \beta)$

and derivative defined by

$\displaystyle d \pi^*(\alpha) := \pi^*(d\alpha)$

so that $(\Omega^*,d)$ is also a graded differential algebra. In fact, when restricted to $\Omega^*$, $\pi^*$ is the inverse of $\pi_1$ so $(\Omega,d)$ and $(\Omega^*,d)$ are effectively equivalent.

The object $\pi^*(\alpha)$ encodes the information required to make the above operations composable in the sense that

$d\left[\pi^*(\alpha)\right] = \pi^*(d\alpha) = (d\alpha,0) = (\pi_2\pi^*(\alpha),0)$

and

$d\left[\pi^*(\alpha)\wedge\pi^*(\beta)\right] = \pi^*\left[d(\alpha\wedge\beta)\right] = \pi^*\left[\pi_2\pi^*(\alpha)\wedge\pi_1\pi^*(\beta) + (-1)^{|\alpha|} \pi_1\pi^*(\alpha)\wedge\pi_2\pi^*(\beta)\right]$

can be expressed entirely in terms of $\pi^*(\alpha)$ and $\pi^*(\beta)$.

## Maps

Motivated again by composability, consider two differential graded algebras $\Omega(\mathcal{M})$ and $\Omega(\mathcal{N})$ over spaces $\mathcal{M}$ and $\mathcal{N}$ together with a map $F:\mathcal{M}\to\mathcal{N}$ inducing a pullback map

$\displaystyle F^*:\Omega^0(\mathcal{N})\to\Omega^0(\mathcal{M})$

defined by

$F^*(f) = f\circ F\in\Omega^0(\mathcal{M})$ for all $f\in\Omega^0(\mathcal{N})$

and we want to compute

$\displaystyle \pi^*\left[F^*(f)\right] = \pi^*(f\circ F).$

Composability means we should be able to compute this in terms of $\pi^*(f)$ and some, yet to be determined, $\pi^*(F)$.

First expand

$\displaystyle \pi^*(f\circ F) = (f\circ F,d(f\circ F))$

and define

$\displaystyle d(f\circ F) := df\circ dF := (dF)^*(df)$

where $dF$ is a linear differential map and

$(dF)^*:\Omega^1(\mathcal{N})\to\Omega^1(\mathcal{M})$

so that

$\displaystyle \pi^*(f\circ F) = (f\circ F,df\circ dF) = (f,df)\circ(F,dF).$

This motivates, in complete analogy, the definition

$\pi^*(F) := (F,dF)$

so that

$\displaystyle \pi^*(F^*(f)) = (\pi_1\pi^*(f)\circ\pi_1\pi^*(F),\pi_2\pi^*(f)\circ\pi_2\pi^*(F))$

is expressible entirely in terms of $\pi^*(f)$ and $\pi^*(F)$.

Composition of maps is defined by

$\pi^*(F)\circ\pi^*(G) := (F\circ G, dF\circ dG)$

so that

$\pi^*(F)\circ\pi^*(G) = \pi^*(F\circ G)$

with $d(F\circ G) = dF\circ dG$.

Combining $F$ with its differential map $dF$ is exactly what is required to maintain composability in the same manner that $\alpha$ must be combined with $d\alpha$ to maintain composability.

## Coordinates

Consider a neighborhood $\mathcal{U}_{\mathcal{M}}\subset\mathcal{M}$ of a smooth $m$-dimensional space $\mathcal{M}$ admitting $m$ coordinate functions $x^\mu: \mathcal{U}_{\mathcal{M}}\to\mathbb{R}$ and a coordinate map $\phi: \mathcal{U}_{\mathcal{M}}\to\mathbb{R}^m$ given by

$\phi(p) = (x^1(p),\cdots,x^m(p)).$

A smooth function $f: \mathcal{M}\to\mathbb{R}$ can be parameterized in terms of the coordinates $x^\mu$ (within $\mathcal{U}_{\mathcal{M}}$) via

$\displaystyle f = f \circ (\phi^{-1} \circ \phi) = f \circ \phi^{-1}(x^1, \cdots ,x^m)$

with differential given by (summation implied)

$\displaystyle df = \frac {\partial f}{\partial x^\mu} dx^\mu.$

Although $\pi^*(f) = (f,df)$ encodes what we need for composability, in order to implement this in code, we need to express $\pi^*(f)$ in coordinates. To do so, write

$\displaystyle \pi^*(f) = (f,df) := \left(f,\left[\frac{\partial f}{\partial\phi}\right]\right)_\phi,$

where

$\displaystyle \left[\frac{\partial f}{\partial \phi}\right] := \begin{bmatrix}\frac{\partial f}{\partial x^1} & \cdots & \frac{\partial f}{\partial x^m}\end{bmatrix}$

is a row vector representing the components $\frac{\partial f}{\partial x^\mu}$ of $df$ in coordinates $\phi = (x^1,\cdots,x^m)$.

Similarly, consider a neighborhood $\mathcal{U}_{\mathcal{N}}\subset\mathcal{N}$ of a smooth $n$-dimensional space $\mathcal{N}$ with $n$ coordinate functions $y^\nu: \mathcal{U}_{\mathcal{N}}\to\mathbb{R}$, a coordinate map $\psi: \mathcal{U}_{\mathcal{N}}\to\mathbb{R}^n$ given by

$\displaystyle \psi(p) = (y^1(p),\cdots,y^n(p))$

and a smooth map $F: \mathcal{M}\to\mathcal{N}$ parameterized (within the neighborhood $\mathcal{U}_{\mathcal{M}}\cap F^{-1}[\mathcal{U}_{\mathcal{N}}]$) via

$\displaystyle \pi^*(F) = (F,dF) := \left( F, \left[\frac{\partial(\psi\circ F)}{\partial \phi}\right] \right)_{(\psi,\phi)},$

where

$\displaystyle \left[\frac{\partial(\psi\circ F)}{\partial \phi}\right] := \begin{bmatrix} \frac{\partial(y^1\circ F)}{\partial x^1} & \cdots & \frac{\partial(y^1\circ F)}{\partial x^m} \\ \vdots & \ddots & \vdots \\ \frac{\partial(y^n\circ F)}{\partial x^1} & \cdots & \frac{\partial(y^n\circ F)}{\partial x^m} \end{bmatrix}$

is the Jacobian matrix.

Given a third map $G: \mathcal{L}\to\mathcal{M}$ with

$\displaystyle \pi^*(G) = \left(G, \left[\frac{\partial(\chi\circ G)}{\partial \phi}\right]\right)_{(\chi,\phi)}$

we have

$\displaystyle \pi^*(F)\circ\pi^*(G) := \left(F\circ G,\left[\frac{\partial(\chi\circ F)}{\partial \phi}\right] \left[\frac{\partial(\phi\circ G)}{\partial \psi}\right] \right)_{(\chi,\psi)}.$

Comparing this to

$\displaystyle \pi^*(F\circ G) = \left(F\circ G,\left[\frac{\partial(\chi\circ F\circ G)}{\partial \psi}\right]\right)_{(\chi,\psi)}$

confirms that

$\pi^*(F)\circ\pi^*(G) = \pi^*(F\circ G)$

as it should.

## Summary and Future Work

This article is basically the result of me trying to find a satisfactory way to explain the almost magical results demostrated in StochasticDiff.jl. As reviewed in a few articles on this blog, e.g. here, stochastic calculus can be formulated as a noncommutative differential algebra so that was a natural framework for me to view the results, but it not really necessary. The main lesson here that was clearly and beautifully enunciated by Conal Elliott was the importance of composability in computation.

By enlarging $\Omega$ to $\Omega^*$ via $\pi^*:\Omega\to\Omega^*$ which combines objects and their differentials, we demonstrated that the basic operations of differential graded algebras are composable and hence computable in a computer program.

For example, we now have

$\displaystyle \pi^*\left[d(\alpha\beta)\right] = d\left[\pi^*(\alpha)\pi^*(\beta)\right] = \left[d\pi^*(\alpha)\right]\pi^*(\beta) + (-1)^{|\alpha|} \pi^*(\alpha)\left[d\pi^*(\beta)\right]$

and

$\displaystyle \pi^*\left[F^*(f)\right] = \pi^*(f)\circ\pi^*(F)$

which are expressible entirely in terms of $\pi^*(\alpha)$, $\pi^*(\beta)$ and $\pi^*(F)$.

In terms of category theory, there are commuting diagrams everywhere here, which indicates these results can be extended to large swaths of differential geometry and topology opening the door to virtually unlimited possibilities for future work. You can tell my interests by scrolling through my articles here so those are likely targets.

SochasticDiff.jl was just a proof of concept, but a compelling one. I will probably clean it up and generalize it a bit and try to create some examples with change of coordinates etc.

This article is a bit mixed. Some parts discuss higher order forms, but the pullback map stopped with forms of degree 0 and 1 since that is what I need to start writing code. The path to higher order forms (via tensor product) is clear though, so that would be a nice topic for further work.

Another interesting avenue would be to consider vector fields and the interior product. I don’t see any major stumbling blocks to working that out. That would then lead to mixed covariant and contravariant tensors, Lie derivatives, Dirac equations, etc. all formulated in a composable / computable manner allowing for some interesting computer programs.

## *References (or Lack Thereof)

For me, this was a passion project and I had a lot of fun working all this out on my own. I worked through nearly an entire notebook of scribbled hieroglyphic-like notes. My only reference (other than Wikipedia to remind me of some basic definitions) was Conal Elliot’s video and his paper, so please refer to that paper and references therein. Also, please let me know if there are any additional references that should be given credit here for prior work and I am happy to add them.

# Discrete Black-Scholes Model

This post is part of a series

In the previous post, we found discrete geometric Brownian motion

$dS = \mu S dt + \sigma S dx$

has the closed-form solution

$\displaystyle S(i,j) = S(0,0) R^{\frac{j+i}{2}} L^{\frac{j-i}{2}}$

where

$R = 1+\mu\Delta t+\sigma\Delta x\quad\text{and}\quad L = 1+\mu\Delta t+\sigma\Delta x$

In this post, I revisit some results presented in

back in 2004.

### Risk-Free Bond Price

The price of a risk-free bond will be modeled as

$dB = r B dt$

which has the closed-form solution

$\boxed{B(i,j) = B(0,0) (1+r\Delta t)^j.}$

### Stock Price

The price of a stock will be modeled as a geometric Brownian motion

$dS = \mu S dt + \sigma S dx.$

However, we will be working in a risk-neutral measure meaning the drift will be the risk-free rate so that

$dS = r S dt + \sigma S d\tilde x,$

where

$\displaystyle d\tilde x = dx - \frac{\mu-r}{\sigma} dt.$

In what follows, we assume risk-neutral measure and write simply

$dS = r S dt + \sigma S dx$

with no$\tilde{}$so

$\displaystyle \boxed{S(i,j) = S(0,0) (1+r\Delta t+\sigma\Delta x)^{\frac{j+i}{2}} (1+r\Delta t-\sigma\Delta x)^{\frac{j-i}{2}}.}$

### Self Financing

A portfolio$\Pi$can be constructed consisting of an option$V$, a stock$S$and a risk-free bond$B,$i.e.

$\displaystyle \Pi = \alpha V + \delta S + \beta B,$

where$\alpha,\delta,\beta$are units held in the respective security.

The Black-Scholes model assumes the portfolio is self financing, i.e. to increase units in one security another security needs to be sold so no money flows into or out of the portfolio. A self-financing portfolio has a simple expression in discrete stochastic calculus

$(d\alpha) V + (d\delta) S + (d\beta) B = 0,$

i.e. the total change in value of the portfolio due to trading activity is zero.

Consequently, if$\Pi$ is self financing, then

$\boxed{d\Pi = \alpha dV + \delta dS + \beta dB.}$

### Hedging Strategy

The final ingredient of the Black-Scholes model is that there is a no-arbitrage hedging strategy$\delta$ that eliminates the “risk” of the portfolio so that

$d\Pi = r\Pi dt.$

For the risky$dx$component of$d\Pi$term to vanish, the component of$\mathbf{e}^+$must equal the component of$\mathbf{e}^-.$

Expanding$d\Pi$and comparing components gives the hedging strategy

$\displaystyle \boxed{\delta(j) = -\alpha(j) \frac{V(i+1,j+1)-V(i-1,j+1)}{S(i+1,j+1)-S(i-1,j+1)}.}$

### Discrete Black-Scholes Equation

Inserting the hedging strategy above into

$d\Pi = r\Pi dt = \alpha dV + \delta dS + \beta dB$

and expanding results in the discrete Black-Scholes equation

$\displaystyle \boxed{V(i,j) = \frac{1}{1+r\Delta}\left[V(i+1,j+1) p^+(j) + V(i-1,j+1) p^-(j)\right],}$

where

$\displaystyle p^+(j) = \frac{(1+r\Delta t) S(i,j) - S(i-1,j+1)}{S(i+1,j+1) - S(i-1,j+1)}$

and$p^-(j) = 1-p^+(j).$

The discrete Black-Scholes equation is essentially the Cox-Ross-Rubinstein model.

### Simplifications

Since we have the closed-form expression for$S$, there are several simplification that result. For instance, we have

$S(i+1,j+1) - S(i-1,j+1) = 2\sigma\Delta x S(i,j)$

and

$(1+r\Delta t) S(i,j) - S(i-1,j+1) = \sigma\Delta x S(i,j)$

so that the hedging strategy simplifies to

$\displaystyle \boxed{\delta(j) = -\alpha(j)\frac{V(i+1,j+1) - V(i-1,j+1)}{2\sigma\Delta x S(i,j)}}$

and$p^+(j) = p^-(j) = 1/2$so that

$\displaystyle \boxed{V(i,j) = \frac{V(i+1,j+1)+V(i-1,j+1)}{2(1+r\Delta t)}.}$

# Discrete Stochastic Differential Equations

This post is part of a series

In this post, I begin looking at discrete stochastic differential equations (DSDEs).

Recall that in the continuum, stochastic differential equations (SDEs) are of the form

$dS = \mu(t,S) dt + \sigma(t,S) dx.$

When transcribing continuum models to the discrete setting, the first challenge is to determine how to write down the DSDEs. Due to the noncommutative nature of discrete stochastic calculus,

$dS = dt \mu(t,S) + dx \sigma(t,S)$

with 0-forms on the right is a different model than above with 0-forms on the left. In fact, we could consider linear combinations such as

$dS=\kappa\left[\mu(t,S)dt+\sigma(t,S) dx\right]+(1-\kappa)\left[dt\mu(t,S)+dx\sigma(t,S)\right].$

Consider the continuum special case of geometric Brownian motion with constant coefficients

$dS = \mu S dt + \sigma S dx$

with known solution

$\displaystyle S(t,x) = S(0,0) \exp\left[\left(\mu - \frac{\sigma^2}{2}\right) t + \sigma x\right].$

With the above solution and switching to the context of noncommutative stochastic calculus, the differential expressed in left components is given by

\begin{aligned} dS&=\left(\partial_t S+\frac{1}{2}\partial_{xx}S\right)dt+\left(\partial_x S\right)dx\\&=\mu S dt+\sigma S dx\end{aligned}

as expected. In contrast, expressing the model in right-component form results in

\begin{aligned} dS&=dt\left(\partial_t S-\frac{1}{2}\partial_{xx}S\right)+dx\left(\partial_x S\right)\\&=dt S(\mu-\sigma)+dx S\sigma\end{aligned},

which although correct and equivalent to the above, the right-component form is not the way we would expect the continuum model to be transcribed. With this as our motivation, we will take the left-component

$dS = \mu(t,S) dt + \sigma(t,S) dx$

to be the way we transcribe continuum models.

## Discrete Geometric Brownian Motion

To solve discrete geometric Brownian motion, first express it in terms of the bases$\mathbf{e}^+$and$\mathbf{e}^-$

\begin{aligned}dS &= \mu S dt + \sigma S dx\\&=\mu S\Delta t(\mathbf{e}^++\mathbf{e}^-) + \sigma S\Delta x(\mathbf{e}^+-\mathbf{e}^-)\\&=(\mu\Delta t + \sigma\Delta x) S\mathbf{e}^+ + (\mu\Delta t - \sigma \Delta x)S\mathbf{e}^-\end{aligned}.

The update expressions can be written immediately as

$S(i+1,j+1) = \left[1+\mu\Delta t(j)+\sigma\Delta x(j)\right] S(i,j)$

and

$S(i-1,j+1) = \left[1+\mu\Delta t(j)-\sigma\Delta x(j)\right] S(i,j)$

so that the solution at any node can be expressed as

$\displaystyle \boxed{S(i,j) = S(0,0)R^{\frac{j+i}{2}} L^{\frac{j-i}{2}}}$

where

$R=\left[1+\mu\Delta t(j)+\sigma\Delta x(j)\right]$

and

$L=\left[1+\mu\Delta t(j)-\sigma\Delta x(j)\right].$

The continuum solution can be written in a similar form as

$S(i,j) = S(0,0) R_c^{\frac{j+i}{2}} L_c^{\frac{j-i}{2}},$

where

\begin{aligned} R_c &= \exp\left[\left(\mu-\frac{\sigma^2}{2}\right)\Delta t + \sigma\Delta x\right]\\&= 1+\left(\mu-\frac{\sigma^2}{2}\right)\Delta t + \sigma\Delta x + \frac{1}{2}\left[\sigma^2(\Delta x)^2+O\left((\Delta x)^3\right)\right]\\&=R + O((\Delta x)^3)\end{aligned}

and

\begin{aligned} L_c &= \exp\left[\left(\mu-\frac{\sigma^2}{2}\right)\Delta t - \sigma\Delta x\right]\\&= 1+\left(\mu-\frac{\sigma^2}{2}\right)\Delta t - \sigma\Delta x + \frac{1}{2}\left[\sigma^2(\Delta x)^2+O\left((\Delta x)^3\right)\right]\\&=L + O((\Delta x)^3).\end{aligned}

## Discrete Ornstein–Uhlenbeck Process

Consider the discrete Ornstein-Uhlenbeck process

\begin{aligned} dS &= \theta (\mu - S) dt + \sigma dx \\&=\left[\theta(\mu-S)\Delta t+\sigma\Delta x\right]\mathbf{e}^++\left[\theta(\mu-S)\Delta t-\sigma\Delta x\right]\mathbf{e}^-\end{aligned}

with update expressions

$S(i+1,j+1) = S(i,j) \left[1-\theta\Delta t(j)\right]+\theta\mu\Delta t(j)+\sigma\Delta x(j)$

and

$S(i-1,j+1) = S(i,j) \left[1-\theta\Delta t(j)\right]+\theta\mu\Delta t(j)-\sigma\Delta x(j).$

Let

$R_j(S(i,j)) = S(i,j)\left[1-\theta\Delta t(j)\right]+\theta\mu\Delta t(j)+\sigma\Delta x(j)$

and

$L_j(S(i,j)) = S(i,j)\left[1-\theta\Delta t(j)\right]+\theta\mu\Delta t(j)-\sigma\Delta x(j).$

For consistency, we require

$L_{j+1} \circ R_j = R_{j+1} \circ L_j.$

Imposing this consistency condition gives constraints on the grid spacing, i.e.

$\displaystyle \Delta x(j+1) = \frac{-1+\sqrt{1+4\theta\left[\Delta x(j)\right]^2}}{2\theta\Delta x(j)}$

implying dynamic grid spacing.

## General Strategy

Armed with the above examples, we can develop a general strategy for solving discrete stochastic differential equations defined by the general process

\begin{aligned} dS&=\mu(t,S) dt+\sigma(t,S) dx\\&=\left[\mu(t,S)\Delta t+\sigma(t,S)\Delta x\right]\mathbf{e}^+ +\left[\mu(t,S)\Delta t-\sigma(t,S)\Delta x\right]\mathbf{e}^-.\end{aligned}

Write down the update expressions as simply

\begin{aligned} S(i+1,j+1)&=R_j(S(i,j))\\&=S(i,j)+\mu\left[j,S(i,j)\right]\Delta t(j)+\sigma\left[j,S(i,j)\right]\Delta x(j)\end{aligned}

and

\begin{aligned} S(i-1,j+1)&=L_j(S(i,j))\\&=S(i,j)+\mu\left[j,S(i,j)\right]\Delta t(j)-\sigma\left[j,S(i,j)\right]\Delta x(j).\end{aligned}

Finally, impose the consistency condition

$L_{j+1} \circ R_j = R_{j+1} \circ L_j.$

# Dynamic Grids

This post is part of a series

In the previous post of this series, I defined discrete stochastic calculus as the discrete calculus on a binary tree with$\Delta t = (\Delta x)^2$ so that the coordinate basis 1-forms$dx$and$dt$ satisfy the commutative relations

• $[dx,x] = dt$
• $[dt,t] = \Delta t dt$
• $[dx,t] = [dt,x] = \Delta t dx.$

A major assumption there was that the grid spacings$\Delta t$and$\Delta x$are constants. It turns out that to solve some discrete stochastic differential equations, we need to relax this assumption. In this post, I generalize discrete stochastic calculus to dynamic grids.

Recall the binary tree

We define the coordinates$x$and$t$ as discrete 0-forms

$\displaystyle x = \sum_{(i,j)} x(i,j) \mathbf{e}^{(i,j)}\quad\text{and}\quad t = \sum_{(i,j)} t(i,j) \mathbf{e}^{(i,j)},$

where we set

$x(i\pm1,j+1) = x(i,j) \pm \Delta x(j)$

and

$t(i\pm1,j+1) = t(i,j) + \Delta t(j),$

i.e. the grid spacings are fixed for a given time, but are allowed to change in time. Therefore,

$dx = \Delta x (\mathbf{e}^+ - \mathbf{e}^-)$

$dt = \Delta t (\mathbf{e}^+ + \mathbf{e}^-)$

where

$\displaystyle \Delta x = \sum_{(i,j)} \Delta x(j) \mathbf{e}^{(i,j)}\quad\text{and}\quad\Delta t = \sum_{(i,j)} \Delta t(j) \mathbf{e}^{(i,j)}.$

Since$\Delta x$and$\Delta t$are no longer constant 0-forms, they do not commute with$\mathbf{e}^+$and$\mathbf{e}^-.$

Straightforward calculations show the generalized coordinates satisfy the commutation relations

$[\mathbf{e}^\pm,x] = \pm \Delta x\mathbf{e}^\pm\quad\text{and}\quad[\mathbf{e}^\pm,t] = \Delta t\mathbf{e}^\pm.$

From there, it is a simple exercise to demonstrate that

$[dx,x] = (\Delta t)^{-1} (\Delta x)^2 dt$

$[dx,t] = [dt,x] = \Delta t dx$

$[dt,t] = \Delta t dt,$

where

$\displaystyle (\Delta t)^{-1} = \sum_{(i,j)} \frac{1}{\Delta t(j)} \mathbf{e}^{(i,j)}.$

Setting

$\Delta t = (\Delta x)^2$

results in the commutation relations

$[dx,x] = dt$

$[dx,t] = [dt,x] = \Delta t dx$

$[dt,t] = \Delta t dt.$

In the continuum limit$\Delta t \to 0$, these reduce to

$[dx,x] = dt$

$[dx,t] = [dt,x] = 0$

$[dt,t] = 0$

giving the noncommutative stochastic calculus.

When the grid spacing$\Delta x(j)$and$\Delta t(j)$are constants, the above reduces to the previously considered case.

In subsequents posts, I will adopt this more general definition of discrete stochastic calculus on dynamic grids.

# Magic Time Step for the Heat Equation

## Wave Equation

In the finite-difference method, there is a fairly remarkable property of the (1+1)-dimensional wave equation

$\partial_t^2 \phi - c^2 \partial_x^2\phi = 0.$

Replacing$\partial_t^2$and$\partial_x^2$with central-difference approximations, a node at$\phi\left(i,j\right)$ may be computed as

$\displaystyle \boxed{\phi(i,j+1) = \kappa\phi(i+1,j) + \kappa\phi(i-1,j) + 2(1-\kappa)\phi(i,j) - \phi(i,j-1)}$

where$\phi(i,j)$is notation for$\phi(i\Delta x,j\Delta t)$and$\kappa = \frac{(c\Delta t)^2}{(\Delta x)^2}.$

The stencil for updating$\phi(i,j+1)$is illustrated below

However, when

$\Delta t = \frac{1}{c}\Delta x$

then$\kappa=1$and the$\phi(i,j)$term drops out so the update expression reduces to

$\displaystyle \boxed{\phi(i,j+1) = \phi(i+1,j) - \phi(i,j-1) + \phi(i-1,j)}$

and does not depend on any physical constants.

This time step is known as the “magic time step” because, the above represents an exact solution to the (1+1)-dimensional wave equation, i.e. for given initial conditions, the wave will propagate exactly to within machine precision.

The stencil for the (1+1)-dimensional wave equation when using the magic time steps reduces to the following:

This stencil is more reminiscent of a binary tree.

## Heat Equation

Next consider a finite-difference approximation to the (1+1)-dimensional heat equation

$\partial_t\phi - \frac{\sigma^2}{2} \partial_x^2\phi = 0$

using a forward-difference approximation for$\partial_t$and a central-difference approximation for$\partial_x^2.$ The resulting update equation is given by

$\displaystyle \boxed{\phi(i,j+1) = \frac{\kappa}{2} \phi(i+1,j) + \frac{\kappa}{2} \phi(i-1,j) + (1-\kappa)\phi(i,j)}$

where$\kappa = \frac{\sigma^2\Delta t}{(\Delta x)^2}.$

The stencil for this update expression is given by

Similar to the wave equation, when setting

$\Delta t = \frac{1}{\sigma^2} (\Delta x)^2$

then$\kappa=1$and the$\phi(i,j)$term drops out so the update expression reduces to

$\displaystyle \boxed{\phi(i,j+1) = \frac{1}{2}\left[\phi(i+1,j) + \phi(i-1,j)\right]}$

and does not depend on any physical constants. The stencil also reduces to

This stencil is reminiscent of a binary tree.

The above time step for the heat equation is also “magic”.

The reduced update expression has a closed form (Greens function) solution being the the normalized binomial coefficient, i.e. if$\phi(0,0) = 1$then

$\displaystyle \phi(i,j) = \frac{1}{2^{j}} {j\choose i}.$

# Noncommutative Geometry and Navier-Stokes Equation

## Noncommutative Geometry

In this section, I briefly review some elementary concepts of noncommutative geometry on commutative algebras primarily to collect some notation to be used later on. For a deeper look into the topic, I highly recommend the largely self-contained review articles:

and

Consider an (n+1)-dimensional differential graded algebra

$\displaystyle \Omega = \bigoplus_{i=0}^n \Omega^i,\quad d:\Omega^i\to\Omega^{i+1}$

over a commutative algebra of 0-forms$\Omega^0$with basis 1-forms $d x^0, ... , d x^n\in\Omega^1.$ The core concept here lies in the fact 0-forms and 1-forms do not commute so any 1-form can be written equivalently in either left- or right-component forms

$\displaystyle \alpha = \overleftarrow{\alpha_\mu} d x^\mu = d x^\mu \overrightarrow{\alpha_\mu}$

respectively. In particular, the differential of a 0-form can be written as

$\displaystyle d f = \left(\overleftarrow{\partial_\mu} f\right) d x^\mu = d x^\mu \left(\overrightarrow{\partial_\mu} f\right),$

where$\overleftarrow{\partial_\mu}: \Omega^0 \to \Omega^0$ and$\overrightarrow{\partial_\mu}: \Omega^0 \to \Omega^0$ are not necessarily partial derivatives.

The basis 1-forms satisfy the commutation relations

$\displaystyle [d x^\mu,x^\nu] = \overleftarrow{C^{\mu\nu}_{\lambda}} d x^\lambda = d x^\lambda\overrightarrow{C^{\mu\nu}_{\lambda}}.$

In this post, I assume$\overleftarrow{C^{\mu\nu}_{\lambda}}$and$\overrightarrow{C^{\mu\nu}_{\lambda}}$are constant 0-forms so that$\overleftarrow{C^{\mu\nu}_{\lambda}}= \overrightarrow{C^{\mu\nu}_{\lambda}}$and write both simply as$C^{\mu\nu}_{\lambda}.$ Note, however, that constant coefficients are not a restriction of the general formalism.

Since $[d x^\mu, x^\nu] = [d x^\nu, x^\mu]$, it follows that

$\displaystyle C^{\mu\nu}_{\lambda} = C^{\nu\mu}_{\lambda},$

i.e.$C^{\mu\nu}_{\lambda}$is symmetric in the upper indices.

More generally,

$\left[d f,g\right] = \left[d g,f\right] = C^{\kappa\lambda}_{\mu}\left(\overleftarrow{\partial_\kappa} f\right)\left(\overleftarrow{\partial_\lambda} g\right) d x^\mu = d x^\mu C^{\kappa\lambda}_{\mu}\left(\overrightarrow{\partial_\kappa} f\right)\left(\overrightarrow{\partial_\lambda} g\right)$

so that the product rule results in

\begin{aligned} d(fg) &= (d f)g + f(d g) \\ &= g(d f) + f(d g) + [d f,g] \\ &= \left[ \left(\overleftarrow{\partial_\mu} f\right) g + f\left(\overleftarrow{\partial_\mu} g\right) + C^{\kappa\lambda}_{\mu} \left(\overleftarrow{\partial_\lambda} f\right)\left(\overleftarrow{\partial_\kappa} g\right) \right] d x^\mu \end{aligned}

and

$\displaystyle \overleftarrow{\partial_\mu}\left(f g\right) = \left(\overleftarrow{\partial_\mu} f\right) g + f\left(\overleftarrow{\partial_\mu} g\right) + C^{\kappa\lambda}_{\mu} \left(\overleftarrow{\partial_\lambda} f\right)\left(\overleftarrow{\partial_\kappa} g\right) .$

Similarly,

$\displaystyle \overrightarrow{\partial_\mu}\left(f g\right) = \left(\overrightarrow{\partial_\mu} f\right) g + f\left(\overrightarrow{\partial_\mu} g\right) - C^{\kappa\lambda}_{\mu}\left(\overrightarrow{\partial_\lambda} f\right)\left(\overrightarrow{\partial_\kappa} g\right) .$

Everything above this point holds for any commutative associative algebra$\Omega^0$with no assumption of smoothness. However, if$\Omega^0$ consists of smooth 0-forms on$\mathbb{R}^{n+1}$, the right- and left-partial derivatives can be expressed as

$\displaystyle \overleftarrow{\partial_\mu} = \partial_\mu + \sum_{r=2}^{\infty} \frac{1}{r!} C^{i_1i_2}_{j_1}C^{i_3j_1}_{j_2}\dots C^{i_rj_{r-2}}_k \partial_{i_1}\dots\partial_{i_r}$

and

$\displaystyle \overrightarrow{\partial_\mu} = \partial_\mu + \sum_{r=2}^{\infty} \frac{(-1)^{r-1}}{r!} C^{i_1i_2}_{j_1}C^{i_3j_1}_{j_2}\dots C^{i_rj_{r-2}}_k \partial_{i_1}\dots\partial_{i_r}.$

## Stochastic Calculus

In the paper

a reformulation of stochastic calculus in terms of noncommutative geometry was presented, where$\Omega^0$consists of smooth 0-forms on$\mathbb{R}^2.$ In this (1+1)-dimensional case, we set$x^0 = t$and$x^1 = x$ with commutation relations

$\displaystyle [dx,t] = [dt,x] = [dt,t] = 0$and$\displaystyle [dx,x] = \sigma^2 dt.$

In other words

$C^{xx}_t = \sigma^2$

and all other coefficients vanish. In this case, we have

$\overleftarrow{\partial_x} = \overrightarrow{\partial_x} = \partial_x,$

where$\partial_x$is the partial derivative with respect to$x,$

$\overleftarrow{\partial_t} = \partial_t + \frac{\sigma^2}{2} \partial^2_x,$

where$\partial_t$ is the partial derivative with respect to$t$

so that

\begin{aligned} df &= \left(\partial_t + \frac{\sigma^2}{2}\partial^2_x f\right) dt + \left(\partial_x f\right) dx \\ &= dt\left(\partial_t + \frac{\sigma^2}{2}\partial^2_x f\right) + dx\left(\partial_x f\right) - [dx,(\partial_x f)] \\ &= dt\left(\partial_t + \frac{\sigma^2}{2}\partial^2_x f\right) + dx\left(\partial_x f\right) - \sigma^2 dt\left(\partial^2_x f\right) \\ &= dt\left(\partial_t - \frac{\sigma}{2}\partial^2_x f\right) + dx\left(\partial_x f\right). \end{aligned}

and we see that

$\overrightarrow{\partial_t} = \partial_t - \frac{\sigma^2}{2} \partial^2_x.$

The Ito formula of stochastic calculus emerges naturally as left-components of the differential $df$and the heat equation emerges when$df$ is expressed in right components.

## Burgers Equation

In the paper

Burgers (1+1)-dimensional equation was derived from noncommutative geometry as a zero-curvature condition on$\mathbb{R}^2$. In the post

I corrected a minor sign error in the above article and indicated a relation to the Navier-Stokes equation. A simplified derivation is reproduced here.

In this section, let

$\displaystyle [dx,t] = [dt,x] = [dt,t] = 0$and$\displaystyle [dx,x] = \eta dt.$

and

$A = dx \left(k u\right)$

denote a connection 1-form, where$k$is a constant and$u$is a 1-form so that

\begin{aligned} dA &= -dx \left(kdu\right) \\ &= dt dx \left[k \left(\partial_t u - \frac{\eta}{2} \partial^2_x u\right)\right] \end{aligned}

and

\begin{aligned} AA &= k^2 dx\left(u dx\right) u \\ &= k^2 dx\left(dx u - [dx, u]\right) u \\ &= dt dx \left[k^2\eta\left(\partial_x u\right) u\right]. \end{aligned}

Putting the above together and setting$k = \frac{1}{\eta}$results in the curvature 2-form

$F = dA + AA = dt dx \frac{1}{\eta} \left[ \partial_t u - \frac{\eta}{2} \partial^2_x u + \left(\partial_x u\right) u. \right]$

If the curvature 2-form vanishes, we have

$\boxed{\partial_t u - \frac{\eta}{2} \partial^2_x u + \left(\partial_x u\right) u = 0}$

which is the (1+1)-dimensional Burger’s equation.

Furthermore, if the curvature 2-form vanishes, it implies we can write the connection 1-form as a pure gauge

$A = \phi^{-1} d\phi = -\left(d\phi^{-1}\right) \phi = -\left(d\psi\right)\psi^{-1},$

where$\psi = \phi^{-1}$. Expanding the differential results in

$A = -\left[dt \left(\partial_t\psi - \frac{\eta}{2}\partial^2_x\psi\right) + dx \left(\partial_x\psi\right)\right] \psi^{-1}$

implies

$\boxed{u = -\eta\left(\partial_x\psi\right)\psi^{-1}\quad\text{and}\quad \partial_t \psi - \frac{\eta}{2}\partial^2_x\psi = 0.}$

The above relations are referred to as the Cole-Hopf transformation.

This was trivially extended to (3+1)-dimensions in the post

Remark: I personally find the fact Burgers equation pops out like this and the Cole-Hopf transformation is almost a tautology mind boggling.

## Navier-Stokes Equation

In this section, consider a (3+1)-dimensional noncommutative geometry with$x^0 = t$and implied sums run over only the spatial dimensions$x^\mu, \mu = 1,2,3$and

$\displaystyle [dx^\mu,t] = [dt,x^\mu] = [dt,t] = 0$and$\displaystyle [dx^\mu,x^\mu] = \eta g^{\mu\nu} dt$

so that

$df = dt \left(\partial_t f - \frac{\eta}{2} \nabla^2 f\right) + dx^\mu \left(\partial_\mu f\right)$

where

$\nabla^2 f = g^{\mu\nu} \partial_\mu\partial_\nu.$

In analogy with Burgers equation, consider a 1-form connection

$A = \frac{1}{\eta} \left(-dt p + dx^\mu u_\mu\right)$

where$\eta$is constant and$p,u_\mu\in\Omega^0$so that

$dA = dt dx^\mu \frac{1}{\eta}\left(\partial_t u_\mu - \frac{\eta}{2}\nabla^2 u_\mu + \partial_\mu p\right) + dx^\mu dx^\nu \frac{1}{\eta}\left(\partial_\mu u_\nu\right)$

and

$AA = dt dx^\mu \frac{1}{\eta^2}\left[\eta\left(u\cdot\nabla\right)u_\mu\right]$

where

$\left(u\cdot\nabla\right) = g^{\kappa\lambda} u_\kappa \partial_\lambda.$

Putting the above together results in the curvature 2-form

\begin{aligned} F &= dA + AA \\ &= dt x^\mu \frac{1}{\eta} \left[\partial_t u_\mu + \left(u\cdot\nabla\right)u_\mu - \frac{\eta}{2}\nabla^2 u_\mu + \partial_\mu p\right] + dx^\mu dx^\nu \frac{1}{\eta} \left[\frac{1}{2}\left(\partial_\mu u_\nu - \partial_\nu u_\mu\right)\right]. \end{aligned}

Unlike Burgers equation, the curvature 2-form does not always vanish so we can express it in the suggestive form

$F = \frac{1}{\eta} \left(dt E + B\right)$

where$E\in\Omega^1$and$B\in\Omega^2$so that

$\boxed{\partial_t u_\mu + \left(u\cdot\nabla\right)u_\mu - \frac{\eta}{2}\nabla^2 u_\mu + \partial_\mu p = E_\mu}$

and

$\boxed{\frac{1}{2}\left(\partial_\mu u_\nu - \partial_\nu u_\mu\right) = B_{\mu\nu}.}$

The above expressions form the Navier-Stokes equations for a Newtonian fluid, where$E$is an external force and$B$is the vorticity.

In the special case of no external force ($E=0$) and no vorticity ($B = 0$), the curvature 2-form vanishes again implying the connection 1-form may be expressed as a pure guage

$A = \frac{1}{\eta} \left(-dt p + dx^\mu u_m\right) = -\left(d\phi\right) \phi^{-1}$

so that

$\boxed{u_\mu = -\eta\left(\partial_\mu\phi\right)\phi^{-1}\quad\text{and}\quad \partial_t\phi - \frac{\eta}{2}\nabla^2\phi - \frac{1}{\eta}p\phi = 0}$

which is a Cole-Hopf-like transformation for the vorticity-free Navier-Stokes equation.

# Weighted Likelihood for Time-Varying Gaussian Parameter Estimation

In a previous article, we presented a weighted likelihood technique for estimating parameters $\theta$ of a probability density function $\rho(x|\theta)$. The motivation being that for time series, we may wish to weigh more recent data more heavily. In this article, we will apply the technique to a simple Gaussian density

$\rho(x|\mu,\nu) = \frac{1}{\sqrt{\pi\nu}} \exp\left[-\frac{(x-\mu)^2}{\nu}\right].$

In this case, the log likelihood is given by

\begin{aligned} \log\mathcal{L}(\mu,\nu) &= \sum_{i=1}^N w_i \log\rho(x_i|\mu,\nu) \\ &= -\frac{1}{2} \log\pi\nu - \frac{1}{\nu} \sum_{i=1}^N w_i \left(x_i - \mu\right)^2 \end{aligned}.

Recall that the maximum likelihood occurs when

\begin{aligned} \frac{\partial}{\partial\mu} \log\mathcal{L}(\mu,\nu) = \frac{\partial}{\partial\nu} \log\mathcal{L}(\mu,\nu) = 0. \end{aligned}

A simple calculation demonstrates that this occurs when

\begin{aligned} \mu = \sum_{i=1}^N w_i x_i \end{aligned}

and

\begin{aligned} \sigma^2 = \sum_{i=1}^N w_i \left(x_i - \mu\right)^2, \end{aligned}

where $\sigma^2 = \nu/2$.

Introducing a weighted expectation operator for a random variable $X$ with samples $x_i$ given by

\begin{aligned} E_w(X) = \sum_{i=1}^N w_i x_i, \end{aligned}

the Gaussian parameters may be expressed in a familiar form via

$\mu = E_w(X)$

and

$\sigma^2 = E_w(X^2) - \left[E_w(X)\right]^2.$

This simple result justifies the use of weighted expectations for time varying Gaussian parameter estimation. As we will see, this is also useful for coding financial time series analysis.