## Weighted Likelihood for Time-Varying Gaussian Parameter Estimation

In a previous article, we presented a weighted likelihood technique for estimating parameters of a probability density function . The motivation being that for time series, we may wish to weigh more recent data more heavily. In this article, we will apply the technique to a simple Gaussian density

In this case, the log likelihood is given by

Recall that the maximum likelihood occurs when

A simple calculation demonstrates that this occurs when

and

where .

Introducing a weighted expectation operator for a random variable with samples given by

the Gaussian parameters may be expressed in a familiar form via

and

This simple result justifies the use of weighted expectations for time varying Gaussian parameter estimation. As we will see, this is also useful for coding financial time series analysis.

## 60 GHz Wireless – A Reality Check

The wireless revolution has been fascinating to watch. Radio (and micro) waves are transforming the way we live our lives. However, I’m increasingly seeing indications the hype may be getting ahead of itself and we’re beginning to have inflated expectations (c/o the hype cycle) about wireless broadband. In this post, I’d like to revisit some of my prior posts on the subject in light of something that has recently come to my attention: 60 GHz wireless.

### Wavelength Matters

As I outlined in Physics of Wireless Broadband, the most important property that determines the propagation characteristics of radio (and micro) waves is its wavelength. Technical news and marketing materials about wireless broadband refer to frequency, but there is a simple translation to wavelength (in cm) given by:

Ages ago when I generated those SAR images, cell phones operated at 900 MHz (0.9 GHz) corresponding to a wavelength of about 33 cm. More recent 3G and 4G wireless devices operate at higher carrier frequencies up to 2.5 GHz corresponding to a shorter wavelength of 12 cm. Earlier this month, the FCC announced plans to release bandwidth at 5 GHz (6 cm).

This frequency creep is partially due to the issues related to the ultimate wireless bandwidth speed limit I outlined, but is also driven by a slight misconception that can be found on Wikipedia:

A key characteristic of bandwidth is that a band of a given width can carry the same amount of information, regardless of where that band is located in the frequency spectrum.

Although this is true from a pure information theoretic perspective, when it comes to wireless broadband, the transmission of information is not determined by Shannon alone. One must also consider Maxwell and there are far fewer people in the world that understand the latter than the former.

The propagation characteristics of 2G radio waves at 900 MHz (33 cm) are already quite different than 3G/4G microwaves at 2.5 GHz (12 cm) not to mention the newly announced 5 GHz (6 cm). That is why I was more than a little surprised to learn that organizations are seriously promoting 60 GHz WiFi. Plugging 60 GHz into our formula gives a wavelength of just 5 mm. This is important for three reasons: 1) Directionality and 2) Penetration, and 3) Diffraction.

### Directionality

As I mentioned in Physics of Wireless Broadband, in order for an antenna to broadcast fairly uniformly in all directions, the antenna length should not be much more than half the carrier wavelength. At 60 GHz, this means the antenna should not be much larger than 2.5 mm. This is not feasible due to the small amount of energy transmitted/received by such a tiny antenna.

Consequently, the antenna would end up being very directional, i.e. it will have preferred directions for transmission/reception, and you’ll need to aim your wireless device toward the router. With the possible exception of being in an empty anechoic chamber, the idea that you’ll be able to carry around a wireless device operating at 60 GHz and maintain a good connection is wishful thinking to say the least.

### Penetration

If directionality weren’t an issue, the transmission characteristics of 60 GHz microwaves alone should dampen any hopes for gigabit wireless at this frequency. Although the physics of transmission is complicated, as a general rule of thumb, the depth at which electromagnetic waves penetrate material is related to wavelength. Early 2G (33 cm) and more recent 3G/4G (12 cm) do a decent job of penetrating walls and doors, etc.

At 60 GHz (5 mm), the signal would be severely challenged to penetrate a paperback novel much less chairs, tables, or cubical walls. As a result, to receive full signal strength, 60 GHz wireless requires direct unobstructed line of sight between the device and router.

### Photon Torpedoes vs. Molasses

The more interesting aspects of wireless signal propagation are diffraction and reflection, both of which can be understood via Huygen’s beautiful principle and both of which depend on wavelength. Wireless signals do a reasonably good job of oozing around obstacles if the wavelength is long compared to the size of the obstacle, i.e. at low frequencies. Wireless signal propagation is much better at lower frequencies because the signal can penetrate walls and doors and for those obstacles that cannot be penetrated, you still might receive a signal because the signal can ooze around corners.

As the frequency of the signal increases, the wave stops behaving like molasses oozing around and through obstacles, and begins acting more like photon torpedoes bouncing around the room like particles and shadowing begins to occur. At 60 GHz, shadowing would be severe and communication would depend on direct line of sight or indirect line of sight via reflections. However, it is important to keep in mind that each time the signal bounces off an obstacle, the strength is significantly weakened.

### What Does it all Mean?

The idea that we can increase wireless broadband speeds simply by increasing the available bandwidth indefinitely is flawed because you must also consider the propagation characteristics of the carrier frequency. There is only a finite amount of spectrum available that has reasonable directionality, penetration, and diffraction characteristics. This unavoidable inherent physical limitation will lead us eventually to the ultimate wireless broadband speed limit. There is no amount of engineering that can defeat Heisenberg.

There are ways to obtain high bandwidth wireless signals, but you must sacrifice directionality. The extreme would be direct line of sight laser beam communications. Two routers can certainly communicate at gigabit speeds and beyond if they are connected by laser beams. Of course, there can be no obstacles between the routers or the signal will be lost. I can almost imagine a future-esque Star Wars-like communication system where individual mobile devices are, in fact, tracked with laser beams, but I don’t see that ever becoming a practical reality.

We still have some time before we reach this ultimate wireless broadband limit, but to not begin preparing for it now is irresponsible. The only future-proof technology is fiber optics. Communities should avoid the temptation to fore go fiber plans in favor of wireless because those who do so will soon bump into this wireless broadband limit and need to roll out fiber anyway.

## More fun with maximum likelihood estimation

A while ago, I wrote a post

### Fun with maximum likelihood estimation

where I jotted down some notes. I ended the post with the following:

Note: The first time I worked through this exercise, I thought it was cute, but I would never compute and as above so the maximum likelihood estimation, as presented, is not meaningful to me. Hence, this is just a warm up for what comes next. Stay tuned…

Well, it has been over a year and I’m trying to get a friend interested in MLE for a side project we might work on together, so thought it would be good to revisit it now.

To briefly review, the probability of observing independent samples may be approximated by

where is a probability density and represents the parameters we are trying to estimate. The key observation becomes clear after a slight change in perspective.

If we take the th root of the above probability (and divide by ), we obtain the geometric mean of the individual densities, i.e.

In computing the geometric mean above, each sample is given the same weighting, i.e. . However, we may have reason to want to weigh some samples heavier than others, e.g. if we are studying samples from a time series, we may want to weigh the more recent data heavier. This inspired me to replace with an arbitrary weight satisfying

With no apologies for abusing terminology, I’ll refer to this as the likelihood function

Replacing with would result in the same parameter estimation as the traditional maximum likelihood method.

It is often more convenient to work with log likelihoods, which has an even more intuitive expression

i.e. the log likelihood is simply the weighted (arithmetic) average of the log densities.

I use this approach to estimate stable density parameters for time series analysis that is more suitable for capturing risk in the tails. For instance, I used this technique when generating the charts in a post from back in 2009:

### 80 Years of Daily S&P 500 Value-at-Risk Estimates

which was subsequently picked up by Felix Salmon of Reuters in

### How has VaR changed over time?

and Tracy Alloway of Financial Times in

### On baseline VaR

If I find a spare moment, which is rare these days, I’d like to update that analysis and expand it to other markets. A lot has happened since August 2009. Other markets I’d like to look at would include other equity markets as well as fixed income. Due to the ability to cleanly model skew, stable distributions are particularly useful for analyzing fixed income returns.

## Leveraged ETFs: Selling vs Hedging

In this brief note, we’ll compare two similar leveraged ETF strategies. We begin by assuming a portfolio consists of an -times leveraged bull ETF with daily return given by

where is the fee charged by the manager and some cash equivalent with daily return . The daily portfolio return is given by

We wish to reduce our exposure to the index.

### Strategy 1

An obvious thing to do to reduce exposure is to sell some shares of the leveraged ETF. In this case, the weight of the ETF is reduced by and the weight of cash increases by . The daily portfolio return is then

### Strategy 2

Another way to reduce exposure is to buy shares in the leveraged bear ETF. The daily return of the bear ETF is

The daily return of this strategy is

### Comparison

For most, I think it should be fairly obvious that Strategy 1 is preferred. However, I occasionally come across people with positions in both the bear and bull ETFs. The difference in the daily return of the two strategies is given by

In other words, if you reduce exposure by buying the bull ETF, you’ll get hit both by fees as well as lost return on your cash equivalent.

Unless you’ve got some interesting derivatives strategy (I’d love to hear about), I recommend not holding both the bear and bull ETFs simultaneously.

Note: I remain long BGU (which is now SPXL) at a cost of US$36 as a long-term investment – despite experts warning against holding these things. It closed yesterday at US$90.92.

## Discrete Stochastic Calculus

This post is part of a series

In the previous post of this series, we found that when Cartesian coordinates are placed on a binary tree, the commutative relations are given by

There are two distinct classes of discrete calculus depending on the relation between and .

### Discrete Exterior Calculus

If we set , the commutative relations reduce to

and in the continuum limit, i.e. , reduce to

In other words, when , the commutative relations vanish in the continuum limit and the discrete calculus converges to the exterior calculus of differential forms.

Because of this, the discrete calculus on a binary tree with will be referred to as the **discrete exterior calculus**.

### Discrete Stochastic Calculus

If instead of , we set , the commutative relations reduce to

and in the continuum limit, i.e. , reduce to

In this case, all commutative relations vanish in the continuum limit except .

In the paper:

I demonstrate how the continuum limit of the commutative relations give rise to (a noncommutative version of) stochastic calculus, where plays the role of a Brownian motion.

Because of this, the discrete calculus on a binary tree with will be referred to as the **discrete stochastic calculus**.

To date, discrete stochastic calculus has found robust applications in mathematical finance and fluid dynamics. For instance, the application of discrete stochastic calculus to Black-Scholes option pricing was presented in

and the application to fluid dynamics was presented in

Both of these subjects will be addressed in more detail as part of this series of articles.

It should be noted that discrete calculus and its special cases of discrete exterior calculus and discrete stochastic calculus represent a new framework for numerical modeling. We are not taking continuum models built on continuum calculus and constructing finite approximations. Instead, we are building a robust mathematical framework that has finiteness built in from the outset. The resulting numerical models are not approximations, but exact models developed within a finite numerical framework. The framework itself converges to the continuum versions so that any numerical models built within this framework will automatically converge to the continuum versions (if such a thing is desired).

Discrete calculus provides a kind of *meta* algorithm. It is an algorithm for generating algorithms.

## A Note on Discrete Helmholtz Decomposition

*The following is a note I sent to my PhD advisor, Professor Weng Cho Chew, on September 13, 2011 after a discussion over dinner as he was headed back to UIUC from a 4-year stint as the Dean of Engineering at the University of Hong Kong.*

### Decomposing Finite Dimensional Inner Product Spaces

Given finite-dimensional inner product spaces , and a linear map , the adjoint map is the unique linear map satisfying the property

for all and .

In this section, we show that can be decomposed into two orthogonal subspaces

This is a fairly simple exercise as any finite-dimensional inner product space can be decomposed into a subspace and its orthogonal complement, i.e.

.

The only thing to show is that .

To do this, note whenever , then

for all . Thus, is also in , i.e. . Similarly, whenever , then

for all . Thus, is also in , i.e. . Since both and , it follows that .

### Hodge-Helmholtz Decomposition

Given finite-dimensional inner product spaces , , and linear maps , such that , we wish to show that the inner product space may be decomposed into three orthogonal subspaces

,

where .

To show this, note that if , then

but this implies and . Conversely, if and , then is trivially in . In other words,

Finally, since , we also have . Consequently, when , then so

Applying the decomposition from the previous section twice, we conclude that

and since , it follows that

which may be expressed simply as

Putting this together we see the desired Hodge-Helmholtz decomposition

### Computational Electromagnetics

The preceding discussion is quite general and holds for any finite-dimensional inner product spaces , , and any linear maps , satisfying . In this section, we specialize to computational electromagnetics.

Consider a discretization of a surface consisting of vertices, directed edges, and oriented triangular faces. If we associate a degree of freedom to each vertex, the span of these degrees of freedom form an -dimensional vector space . Associating a degree of freedom to each directed edge forms an -dimensional vector space and associating a degree of freedom to each oriented face forms an -dimensional vector space . For concreteness, vectors in will be expanded via

where denotes the degree of freedom on the i^{th} vertex, vectors in will be expanded via

where denotes the degree of freedom on the i^{th} directed edge, and vectors in will be expanded via

where denotes the degree of freedom on the i^{th} oriented face.

To turn , , and into inner product spaces, we need to define three respective inner products. This can be done by defining three sets of basis functions , , and . and take values defined at vertices and faces, respectively, and maps these to functions defined over each face. Similarly, takes values defined along each edge and maps these to vector fields defined over each face.

The basis functions linearly turn vectors in , , and into functions and vector fields defined over the surface via

and

The inner products may then be defined in terms of basis functions via

and

Letting , , and denote column matrix representations of vectors in , , and , the inner products may be expressed in terms of matrix-vector products via

and

The matrix-vector representation is helpful for explicitly expressing the adjoint of a linear map via

so that

Similarly, the adjoint of a linear map may be represented in matrix form via

In computational electromagnetics, a fundamental linear map is the exterior derivative, which will be denoted for . Since , , and are finite dimensional, has a sparse matrix representation .

For the sake of interpretation, the matrix may be thought of as the gradient along the respective directed edge, may be thought of as the curl of the edge vector field around each oriented face, may be thought of as the transverse gradient[1] across each directed edge, and may be thought of as the divergence of the edge vector field.

Critically note,

As a result, we have the inner product space of edge vector fields decomposes into

where . In other words, any edge vector may be expressed as

for some , , and . The above may be thought of as a discrete version of Hodge-Helmholtz decomposition for computational electromagnetics.

### References

This note is an informal (and quickly drafted) document intended to help explain Hodge-Helmholtz decomposition in computational electromagnetics. No claim of any original content is intended and a proper literature search was not performed. For pointers to some related material with more complete references, see the following:

- PyDEC: Software and Algorithms for Discretization of Exterior Calculus
- Least Squares Ranking on Graphs

[1] If the degree of freedom associated to an oriented face is interpreted as the magnitude of vector normal to the face, may be thought of as the curl of this normal vector field along the directed edge.

## Network Theory and Discrete Calculus – Coordinates

This post is part of a series

When the binary tree was presented in the context of discrete calculus, the following small section of the tree was illustrated to establish the way the nodes are labelled

Two sets of coordinates were introduced on the binary tree:

- Cartesian coordinates
- Graph coordinates

The following illustrates the binary tree when we zoom out a bit:

Cartesian coordinates were defined such that

- and

resulting in

- and

Although Cartesian coordinates often help to relate discrete calculus to continuum calculus, the expressions are often not the most natural to work with when performing computations. One reason for this can be understood by overlaying the Cartesian coordinate lines onto the binary tree.

On the other hand, graph coordinates are defined such that

resulting in

- and

Computations are often cleaner when using graph coordinates. One reason for this can be understood by overlaying the graph coordinate lines onto the binary tree.

For instance, the commutative relations in graph coordinates are given by

whereas the commutative relations for Cartesian coordinates are given by

The cross commutative relations between the two sets of coordinates are given by

As a final note, for any discrete 0-form , the above indicates that