# Phorgy Phynance

## Weighted Likelihood for Time-Varying Gaussian Parameter Estimation

In a previous article, we presented a weighted likelihood technique for estimating parameters $\theta$ of a probability density function $\rho(x|\theta)$. The motivation being that for time series, we may wish to weigh more recent data more heavily. In this article, we will apply the technique to a simple Gaussian density

$\rho(x|\mu,\nu) = \frac{1}{\sqrt{\pi\nu}} \exp\left[-\frac{(x-\mu)^2}{\nu}\right].$

In this case, the log likelihood is given by

\begin{aligned} \log\mathcal{L}(\mu,\nu) &= \sum_{i=1}^N w_i \log\rho(x_i|\mu,\nu) \\ &= -\frac{1}{2} \log\pi\nu - \frac{1}{\nu} \sum_{i=1}^N w_i \left(x_i - \mu\right)^2 \end{aligned}.

Recall that the maximum likelihood occurs when

\begin{aligned} \frac{\partial}{\partial\mu} \log\mathcal{L}(\mu,\nu) = \frac{\partial}{\partial\nu} \log\mathcal{L}(\mu,\nu) = 0. \end{aligned}

A simple calculation demonstrates that this occurs when

\begin{aligned} \mu = \sum_{i=1}^N w_i x_i \end{aligned}

and

\begin{aligned} \sigma^2 = \sum_{i=1}^N w_i \left(x_i - \mu\right)^2, \end{aligned}

where $\sigma^2 = \nu/2$.

Introducing a weighted expectation operator for a random variable $X$ with samples $x_i$ given by

\begin{aligned} E_w(X) = \sum_{i=1}^N w_i x_i, \end{aligned}

the Gaussian parameters may be expressed in a familiar form via

$\mu = E_w(X)$

and

$\sigma^2 = E_w(X^2) - \left[E_w(X)\right]^2.$

This simple result justifies the use of weighted expectations for time varying Gaussian parameter estimation. As we will see, this is also useful for coding financial time series analysis.

Written by Eric

February 3, 2013 at 4:33 pm

## More fun with maximum likelihood estimation

with one comment

A while ago, I wrote a post

### Fun with maximum likelihood estimation

where I jotted down some notes. I ended the post with the following:

Note: The first time I worked through this exercise, I thought it was cute, but I would never compute $\mu$ and $\sigma^2$ as above so the maximum likelihood estimation, as presented, is not meaningful to me. Hence, this is just a warm up for what comes next. Stay tuned…

Well, it has been over a year and I’m trying to get a friend interested in MLE for a side project we might work on together, so thought it would be good to revisit it now.

To briefly review, the probability of observing $N$ independent samples $X\in\mathbb{R}^N$ may be approximated by

\begin{aligned} P(X|\theta) = \prod_{i = 1}^N P(x_i|\theta) = \left(\Delta x\right)^N \prod_{i=1}^N \rho(x_i|\theta),\end{aligned}

where $\rho(x|\theta)$ is a probability density and $\theta$ represents the parameters we are trying to estimate. The key observation becomes clear after a slight change in perspective.

If we take the $N$th root of the above probability (and divide by $\Delta x$), we obtain the geometric mean of the individual densities, i.e.

\begin{aligned} \langle \rho(X|\theta)\rangle_{\text{geom}} = \prod_{i=1}^N \left[\rho(x_i|\theta)\right]^{1/N}.\end{aligned}

In computing the geometric mean above, each sample is given the same weighting, i.e. $1/N$. However, we may have reason to want to weigh some samples heavier than others, e.g. if we are studying samples from a time series, we may want to weigh the more recent data heavier. This inspired me to replace $1/N$ with an arbitrary weight $w_i$ satisfying

\begin{aligned} w_i\ge 0,\quad\text{and}\quad \sum_{i=1}^N w_i = 1.\end{aligned}

With no apologies for abusing terminology, I’ll refer to this as the likelihood function

\begin{aligned} \mathcal{L}(\theta) = \prod_{i=1}^N \rho(x_i|\theta)^{w_i}.\end{aligned}

Replacing $w_i$ with $1/N$ would result in the same parameter estimation as the traditional maximum likelihood method.

It is often more convenient to work with log likelihoods, which has an even more intuitive expression

\begin{aligned}\log\mathcal{L}(\theta) = \sum_{i=1}^N w_i \log \rho(x_i|\theta),\end{aligned}

i.e. the log likelihood is simply the weighted (arithmetic) average of the log densities.

I use this approach to estimate stable density parameters for time series analysis that is more suitable for capturing risk in the tails. For instance, I used this technique when generating the charts in a post from back in 2009:

### 80 Years of Daily S&P 500 Value-at-Risk Estimates

which was subsequently picked up by Felix Salmon of Reuters in

### How has VaR changed over time?

and Tracy Alloway of Financial Times in

### On baseline VaR

If I find a spare moment, which is rare these days, I’d like to update that analysis and expand it to other markets. A lot has happened since August 2009. Other markets I’d like to look at would include other equity markets as well as fixed income. Due to the ability to cleanly model skew, stable distributions are particularly useful for analyzing fixed income returns.

Written by Eric

October 20, 2012 at 5:02 pm

## Leveraged ETFs: Selling vs Hedging

In this brief note, we’ll compare two similar leveraged ETF strategies. We begin by assuming a portfolio consists of an $x$-times leveraged bull ETF with daily return given by

$R_{\text{Long}} = x R_{\text{Index}} - R_{\text{Fee}},$

where $R_{\text{Fee}}$ is the fee charged by the manager and some cash equivalent with daily return $R_{\text{Cash}}$. The daily portfolio return is given by

\begin{aligned} R_{\text{Portfolio}} &= w_{\text{Long}} R_{\text{Long}} + w_{\text{Cash}} R_{\text{Cash}} \\ &= w_{\text{Long}} \left(x R_{\text{Index}} - R_{\text{Fee}}\right) + w_{\text{Cash}} R_{\text{Cash}}.\end{aligned}

We wish to reduce our exposure to the index.

### Strategy 1

An obvious thing to do to reduce exposure is to sell some shares of the leveraged ETF. In this case, the weight of the ETF is reduced by $\Delta w$ and the weight of cash increases by $\Delta w$. The daily portfolio return is then

$R_{\text{Strategy 1}} = R_{\text{Portfolio}} + \Delta w \left(-x R_{\text{Index}} + R_{\text{Fee}} + R_{\text{Cash}}\right).$

### Strategy 2

Another way to reduce exposure is to buy shares in the leveraged bear ETF. The daily return of the bear ETF is

$R_{\text{Short}} = -x R_{\text{Index}} - R_{\text{Fee}}.$

The daily return of this strategy is

$R_{\text{Strategy 2}} = R_{\text{Portfolio}} + \Delta w \left(-x R_{\text{Index}} - R_{\text{Fee}} - R_{\text{Cash}}\right).$

### Comparison

For most, I think it should be fairly obvious that Strategy 1 is preferred. However, I occasionally come across people with positions in both the bear and bull ETFs. The difference in the daily return of the two strategies is given by

$\Delta R = 2\left(R_{\text{Fee}} + R_{\text{Cash}}\right).$

In other words, if you reduce exposure by buying the bull ETF, you’ll get hit both by fees as well as lost return on your cash equivalent.

Unless you’ve got some interesting derivatives strategy (I’d love to hear about), I recommend not holding both the bear and bull ETFs simultaneously.

Note: I remain long BGU (which is now SPXL) at a cost of US$36 as a long-term investment – despite experts warning against holding these things. It closed yesterday at US$90.92.

Written by Eric

October 2, 2012 at 3:24 pm

## Discrete Stochastic Calculus

This post is part of a series

In the previous post of this series, we found that when Cartesian coordinates are placed on a binary tree, the commutative relations are given by

• $[dx,x] = \frac{(\Delta x)^2}{\Delta t} dt$
• $[dt,t] = \Delta t dt$
• $[dx,t] = [dt,x] = \Delta t dx.$

There are two distinct classes of discrete calculus depending on the relation between $\Delta x$ and $\Delta t$.

### Discrete Exterior Calculus

If we set $\Delta x = \Delta t$, the commutative relations reduce to

• $[dx,x] = \Delta t dt$
• $[dt,t] = \Delta t dt$
• $[dx,t] = [dt,x] = \Delta t dx$

and in the continuum limit, i.e.  $\Delta t\to 0$, reduce to

• $[dx,x] = 0$
• $[dt,t] = 0$
• $[dx,t] = [dt,x] = 0.$

In other words, when $\Delta x = \Delta t$, the commutative relations vanish in the continuum limit and the discrete calculus converges to the exterior calculus of differential forms.

Because of this, the discrete calculus on a binary tree with $\Delta x = \Delta t$ will be referred to as the discrete exterior calculus.

### Discrete Stochastic Calculus

If instead of $\Delta x = \Delta t$, we set $(\Delta x)^2 = \Delta t$, the commutative relations reduce to

• $[dx,x] = dt$
• $[dt,t] = \Delta t dt$
• $[dx,t] = [dt,x] = \Delta t dx$

and in the continuum limit, i.e.  $\Delta t\to 0$, reduce to

• $[dx,x] = dt$
• $[dt,t] = 0$
• $[dx,t] = [dt,x] = 0.$

In this case, all commutative relations vanish in the continuum limit except $[dx,x] = dt$.

In the paper:

• #### Noncommutative Geometry and Stochastic Calculus: Applications in Mathematical Finance

I demonstrate how the continuum limit of the commutative relations give rise to (a noncommutative version of) stochastic calculus, where $dx$ plays the role of a Brownian motion.

Because of this, the discrete calculus on a binary tree with $(\Delta x)^2 = \Delta t$ will be referred to as the discrete stochastic calculus.

To date, discrete stochastic calculus has found robust applications in mathematical finance and fluid dynamics. For instance, the application of discrete stochastic calculus to Black-Scholes option pricing was presented in

• #### Financial Modelling Using Discrete Stochastic Calculus

and the application to fluid dynamics was presented in

• #### Discrete Burgers Equation Revisited

Both of these subjects will be addressed in more detail as part of this series of articles.

It should be noted that discrete calculus and its special cases of discrete exterior calculus and discrete stochastic calculus represent a new framework for numerical modeling. We are not taking continuum models built on continuum calculus and constructing finite approximations. Instead, we are building a robust mathematical framework that has finiteness built in from the outset. The resulting numerical models are not approximations, but exact models developed within a finite numerical framework. The framework itself converges to the continuum versions so that any numerical models built within this framework will automatically converge to the continuum versions (if such a thing is desired).

Discrete calculus provides a kind of meta algorithm. It is an algorithm for generating algorithms.

Written by Eric

August 25, 2012 at 12:38 pm

## Network Theory and Discrete Calculus – Notation Revisited

This post is part of a series

As stated in the Introduction to this series, one of my goals is to follow along with John Baez’ series and reformulate things in the language of discrete calculus. Along the way, I’m coming across operations that I haven’t used in any of my prior applications of discrete calculus to mathematical finance and field theories. For instance, in the The Discrete Master Equation, I introduced a boundary operator

\begin{aligned} \partial \mathbf{e}^{i,j} = \mathbf{e}^j-\mathbf{e}^i.\end{aligned}

Although, I hope the reason I call this a boundary operator is obvious, it would be more precise to call this something like graph divergence. To see why, consider the boundary of an arbitrary discrete 1-form

\begin{aligned}\partial \alpha = \sum_{i,j} \alpha_{i,j} \left(\mathbf{e}^j - \mathbf{e}^i\right) = \sum_i \left[ \sum_j \left(\alpha_{j,i} - \alpha_{i,j}\right)\right] \mathbf{e}^i.\end{aligned}

A hint of sloppy notation has already crept in here, but we can see that the boundary of a discrete 1-form at a node $i$ is the sum of coefficients flowing into node $i$ minus the sum of coefficients flowing out of node $i$. This is what you would expect of a divergence operator, but divergence depends on a metric. This operator does not, hence it is topological in nature. It is tempting to call this a topological divergence, but I think graph divergence is a better choice for reasons to be seen later.

One reason the above notation is a bit sloppy is because in the summations, we should really keep track of what directed edges are actually present in the directed graph. Until now, simply setting

$\mathbf{e}^{i,j} = 0$

if there is no directed edge from node $i$ to node $j$ was sufficient. Not anymore.

Also, for applications I’ve used discrete calculus so far, there has always only been a single directed edge connecting any two nodes. When applying discrete calculus to electrical circuits, as John has started doing in his series, we obviously would like to consider elements that are in parallel.

I tend to get hung up on notation and have thought about the best way to deal with this. My solution is not perfect and I’m open to suggestions, but what I settled on is to introduce a summation not only over nodes, but also over directed edges connected those nodes. Here it is for an arbitrary discrete 1-form

\begin{aligned}\alpha = \sum_{i,j} \sum_{\epsilon\in [i,j]} \alpha_{i,j}^{\epsilon} \mathbf{e}_\epsilon^{i,j},\end{aligned}

where $[i,j]$ is the set of all directed edges from node $i$ to node $j$. I’m not 100% enamored, but is handy for performing calculations and doesn’t make me think too much.

For example, with this new notation, the boundary operator is much clearer

\begin{aligned} \partial \alpha &= \sum_{i,j} \sum_{\epsilon\in [i,j]} \alpha_{i,j}^{\epsilon} \left(\mathbf{e}^{j}-\mathbf{e}^i\right) \\ &= \sum_i \left[\sum_j \left( \sum_{\epsilon\in[j,i]} \alpha_{j,i}^{\epsilon} - \sum_{\epsilon\in[i,j]} \alpha_{i,j}^{\epsilon} \right)\right]\mathbf{e}^i.\end{aligned}

As before, this says the graph divergence of $\alpha$ at the node $i$ is the sum of all coefficients flowing into node $i$ minus the sum of all coefficients flowing out of node $i$. Moreover, for any node $j$ there can be one or more (or zero) directed edges from $j$ into $i$.

Written by Eric

November 19, 2011 at 11:27 pm

## Leverage Causes Fat Tails and Clustered Volatility

Doyne Farmer is awesome. I first ran into him back in 2001 (or maybe 2002) at the University of Chicago where he was giving a talk on order book dynamics with some awesome videos from the order book of the London Stock Exchange. He has another recent paper that also looks very interesting:

Leverage Causes Fat Tails and Clustered Volatility
Stefan Thurner, J. Doyne Farmer, John Geanakoplos
(Submitted on 11 Aug 2009 (v1), last revised 10 Jan 2010 (this version, v2))

We build a simple model of leveraged asset purchases with margin calls. Investment funds use what is perhaps the most basic financial strategy, called “value investing”, i.e. systematically attempting to buy underpriced assets. When funds do not borrow, the price fluctuations of the asset are normally distributed and uncorrelated across time. All this changes when the funds are allowed to leverage, i.e. borrow from a bank, to purchase more assets than their wealth would otherwise permit. During good times competition drives investors to funds that use more leverage, because they have higher profits. As leverage increases price fluctuations become heavy tailed and display clustered volatility, similar to what is observed in real markets. Previous explanations of fat tails and clustered volatility depended on “irrational behavior”, such as trend following. Here instead this comes from the fact that leverage limits cause funds to sell into a falling market: A prudent bank makes itself locally safer by putting a limit to leverage, so when a fund exceeds its leverage limit, it must partially repay its loan by selling the asset. Unfortunately this sometimes happens to all the funds simultaneously when the price is already falling. The resulting nonlinear feedback amplifies large downward price movements. At the extreme this causes crashes, but the effect is seen at every time scale, producing a power law of price disturbances. A standard (supposedly more sophisticated) risk control policy in which individual banks base leverage limits on volatility causes leverage to rise during periods of low volatility, and to contract more quickly when volatility gets high, making these extreme fluctuations even worse.

I completely agree with this idea. In fact, I discussed this concept with Francis Longstaff at the last advisory board meeting of UCLA’s financial engineering program. Back in December, I spent the majority of a flight back to Hong Kong from Europe doodling a bunch of math trying to express the idea in formulas, but didn’t come up with anything worth writing home about. But it seems like they make some good progress in this paper.

Basically, financial firms of all stripes have performance targets. In a period of decreasing volatility (as we were in preceding the crisis), asset returns tend to decrease as well. To compensate, firms tend to move out further along the risk spectrum and/or increase leverage to maintain a given return level. The dynamics here is that leverage tends to increase as volatility decreases. However, the increased leverage increases the chance of a tail event occurring as we experienced.

On first glance, this paper captures a lot of the dynamics I’ve been wanting to see written down somewhere. Hopefully this gets some attention.

Written by Eric

April 5, 2011 at 11:12 am

## Modeling Currencies

I hope to begin some research into currencies. Before I come out with any result though, I thought I’d ask an open question and hope someone comes by with a response.

First of all, as a former scientist, thinking about currencies is very fun. See, for example, my previous article

This morning, I happened across a recent article that appeared on the arxiv:

The second paragraph really stood out:

One of the problems in foreign exchange research is that currencies are priced against each other so no independent numeraire exists. Any currency chosen as a numeraire will be excluded from the results, yet its intrinsic patterns can indirectly affect overall patterns. There is no standard solution to this issue or a standard numeraire candidate. Gold was considered, but rejected due to its high volatility. This is an important problem as different numeraires will give different results if strong multidimensional cross-correlations are present. Different bases can also generate different tree structures. The inclusion or exclusion of currencies from the sample can also give different results.

This is interesting because financial modeling is often about prices of securities or changes in prices. Currencies are about the relationship between prices. In graph theoretic (or category theoretic) terms, it is tempting to say that currency models should be about directed edges (or morphisms).

Is the best way to model currencies to choose some numeraire as is done in this paper? Or is there a way to study the relationships (morphisms) directly?

Written by Eric

October 28, 2010 at 6:01 pm