MCMC / Particle Filtering

Gibbs Sampling, Metropolis-Hastings and Particle Filtering (Sequential Monte Carlo) are sampling-based methods for approximating distributions (joint or marginal) that are difficult to compute directly.

Particle

All methods above use the concept of particle, which just means a complete assignment of all model variables. For example:

$X_1$ $X_2$ $X_3$
Particle 1 1 0 0
Particle 2 1 0 1
Particle 3 0 1 1
Particle 4 0 1 0
... ... ... ...

If the particles are sampled from some distribution, then we can compute the empirical distribution as an approximate of the true distribution.

MCMC Methods

Markov Chain Monte Carlo (MCMC) algorithms are methods for randomly sampling particles from a complicated distribution based on Markov chains.

Gibbs Sampling

Gibbs sampling is a popular MCMC method. It is used when the joint probability $\prob{X_1,\dots,X_n}$ is complicated but the conditional probability $\prob{X_i\mid X_\text{other}}$ is easy to compute.

  • Initialize a random sample $x = (X_1=v_1,\dots,X_n=v_n)$
  • Until convergence:
    • Choose a variable $X_i$
    • Fix the values of other variables, and then compute $$w(v) = \prob{X_i=v\mid X_{\text{other}} = v_{\text{other}}}$$ for all possible values $v$ of $X_i$.
    • Replace $x \gets x[X_i = v]$ where $v$ is sampled from the distribution $w$.

Some variants of Gibbs sampling include:

  • Block Gibbs sampling: Choose multiple variables at the same time. The function $w$ becomes $$w(v_1,\dots,v_r) = \prob{X_{i_1}=v_1,\dots,X_{i_r}=v_r\mid X_{\text{other}} = v_{\text{other}}}$$
  • Collapsed Gibbs sampling: Instead of using all other variables in $X_\text{other}$, marginalize out some variables. This is only a good idea when marginalization is easy to compute.

Metropolis-Hastings

The other popular MCMC algorithm is the Metropolis-Hastings algorithm, which allows us to sample from an unnormalized distribution $f$.

  • Initialize a random sample $x$
  • Define a proposal density $Q(x_\text{next}\mid x_\text{prev})$. One constraint is that $Q$ must be symmetric: $Q(x\mid y) = Q(y\mid x)$. For example, use the Gaussian distribution around $x_\text{prev}$.
  • Until bored:
    • Sample $x'$ from $Q(\cdot\mid x)$
    • Compute the acceptance ratio $$\alpha = \frac{f(x')}{f(x)}$$
    • If $\alpha \geq 1$, then immediately accept $x'$. Otherwise, accept $x'$ with probability $\alpha$.
    • If $x'$ is accepted, use $x'$ for the next iteration. Otherwise, keep using $x$.

The acceptance ratio $\alpha$ indicates how likely $x'$ is wrt. the old sample $x$.

Problems with MCMC

Although (irreducible) MCMC chains are guaranteed to converge to the true distribution, MCMC methods usually have 2 disadvantages:

  • The samples are correlated. In Metropolis-Hastings, this can be fixed by occasionally changing the proposal density $Q$ so that $Q(x_\text{far-away}\mid x)$ has more probability.
  • During the burn-in period (1000 first samples or so — no good theoretical guarantee), the samples may not be related to the target distribution at all. So throw away the burn-in samples.

Particle Filtering (Sequential Monte Carlo)

For simplicity, let's consider only the chain-structured graph:

$X_1$
$X_2$
$X_3$
$\cdots$
$E_1$
$E_2$
$E_3$

We first review beam search:

  • Initialize $C = \set{\emptyset}$
  • For each $i$:
    • Extend: $$C \gets \set{x_{1:i-1} \cup \set{X_i = x_i} : x_{1:i-1} \in C, \text{all possible } x_i}$$
    • Prune $C$ to the top $K$ partial assignments with the highest weights.

There are 2 problems with beam search:

  • Extend: we need to consider all possible $v$, which is expensive.
  • Prune: the set of top $K$ partial assignments is usually not diverse.

Particle filtering solves both problems. Let us define:

  • Proposal distribution $\pi_i(x_i\mid x_{1:i-1})$ (similar to Metropolis-Hastings), which is the guess of the value of $X_i$ given the previous values. For HMM, we usually use the transition probability $$\pi_i(x_i\mid x_{1:i-1}) = \prob{x_i\mid x_{i-1}}$$
  • Reweighting distribution: $$w_i(x_{1:i}) = \frac{\prob{x_{1:i}}/\prob{x_{1:i-1}}}{\pi_i(x_i\mid x_{1:i-1})}$$ where the numerator is the potential that get multiplied when reach $X_i$. This reweighting function is defined to correct the "guessed" information $\pi_i$. For HMM, $$w_i(x_{1:i}) = \frac{\prob{x_i\mid x_{i-1}}\prob{e_i\mid x_i}}{\prob{x_i\mid x_{i-1}}} = \prob{e_i\mid x_i}$$

The algorithm is as follows:

  • Initialize $C = \set{\emptyset}$
  • For each $i$:
    • Extend: $$C \gets \set{x_{1:i-1} \cup \set{X_i = x_i} : x_{1:i-1} \in C, x_i \sim \pi(x_i\mid x_{1:i-1})}$$ Only sample one $x_i$ for each element in $C$. (Note that multiple elements $x_{1:i-1}$ in $C$ can share the same value.)
    • Reweight: Compute weight $w_i(x_{1:i})$ for each $x_{1:i}$ in $C$
    • Resample: $C \gets$ sample $K$ elements independently from $C$ according to the computed weight. Note that we can get repeated samples.

Particle filtering can also be viewed as a recursive version of importance sampling.

Reference

Exported: 2021-01-03T09:46:08.712603