Phorgy Phynance

Archive for January 2011

Fun with maximum likelihood estimation

with 2 comments

The following is a fun little exercise that most statistics students have probably worked out as a homework assignment at some point, but since I have found myself rederiving it a few times over the years, I decided to write this post for the record to save me some time the next time this comes up.

Given a probability density \rho(x), we can approximate the probability of a sample falling within a region \Delta x around the value x_i\in\mathbb{R} by

P(x_i) = \rho(x_i)\Delta x.

Similarly, the probability of observing N independent samples X\in\mathbb{R}^N is approximated by

P(X) = \prod_{i = 1}^N P(x_i) = \left(\Delta x\right)^N \prod_{i=1}^N \rho(x_i).

In the case of a normal distribution, the density is parameterized by two parameters \mu and \nu and we have

\rho(x|\mu,\nu) = \frac{1}{\sqrt{\pi\nu}} \exp\left[-\frac{(x-\mu)^2}{\nu}\right].

The probability of observing a given sample is then approximated by

P(X|\mu,\nu) =   \left(\frac{\Delta x}{\sqrt{\pi}}\right)^N \nu^{-N/2} \exp\left[-\frac{1}{\nu} \sum_{i=1}^N (x_i - \mu)^2\right].

The idea behind maximum likelihood estimation is that the parameters should be chosen such that the probability of observing the given samples is maximized. This occurs when the differential vanishes, i.e.

dP(X|\mu,\nu) = \frac{\partial P(X|\mu,\nu)}{\partial \mu} d\mu + \frac{\partial P(X|\mu,\nu)}{\partial \nu} d\nu = 0.

This, in turn, vanishes only when both components vanish, i.e.

\frac{\partial P(X|\mu,\nu)}{\partial \mu} = \frac{\partial P(X|\mu,\nu)}{\partial \nu} = 0.

The first component is given by

\frac{\partial P(X|\mu,\nu)}{\partial \mu} = P(X|\mu,\nu) \left[-\frac{2}{\nu} \sum_{i=1}^N (x_i - \mu)\right]

and vanishes when

\mu = \frac{1}{N} \sum_{i = 1}^N x_i.

The second component is given by

\frac{\partial P(X|\mu,\nu)}{\partial \nu} = P(X|\mu,\nu) \left[-\frac{N}{2\nu} + \frac{1}{\nu^2} \sum_{i=1}^N (x_i - \mu)^2\right]

and vanishes when

\sigma^2 = \frac{1}{N} \sum_{i=1}^N (x_i-\mu)^2,

where \nu = 2\sigma^2.

Note: The first time I worked through this exercise, I thought it was cute, but I would never compute \mu and \sigma^2 as above so the maximum likelihood estimation, as presented, is not meaningful to me. Hence, this is just a warm up for what comes next. Stay tuned…

Written by Eric

January 2, 2011 at 10:56 pm