As a running example, we will consider the problem of mixture modeling. Let's consider the parametric case first.
Finite Mixture Models
Suppose we have data $x_1, \dots, x_N$ and we want to group them into clusters. Each cluster $j$ has distribution $F(\theta_j)$. For example, $F$ can be multivariate Gaussian, and $\theta_j$ will be $(\mu_j, \Sigma_j)$.
Suppose we know the number of clusters $K$. There are 2 ways to model the mixture.
View 1
Let $z_i \in \set{1,\dots,K}$. To sample $x_1, \dots, x_N$,
- $z_i \sim \Mr{Multi}(\pi) = \Mr{Multi}(\pi_1, \dots, \pi_K)$
- $x_i \sim F(\theta_{z_i})$
The priors are
- $\theta_j \sim H(\lambda)$ for some conjugate prior $H$ of $F$
- $\pi \sim \Mr{Dir}(\alpha_1, \dots, \alpha_K)$. Usually, we set $\alpha_i = \alpha_0 / K$.
View 2
Let $\Theta = \set{\theta_1, \dots, \theta_K}$ be the parameter space for $x$. Let $G$ be a distribution on $\Theta$ defined as
$$\begin{aligned} G(\theta) &= \sum_j \pi_j \delta(\theta, \theta_j) \\ &= \pi_j \text{ such that } (\theta = \theta_j) \end{aligned}$$
where $\pi \sim \Mr{Dir}(\alpha_1, \dots, \alpha_K)$ and $\theta_j \sim H(\lambda)$.
Let $\bar{\theta}$ be the draws (or the samples) from $G$. Then to sample $x_1, \dots, x_N$, we sample $\bar{\theta}_i \sim G$ and $x_i \sim F(\bar{\theta}_i)$. The connection to View 1 is that $\bar{\theta}_i = \theta_{z_i}$.
Connection with De Finetti's Theorem:
- $\bar{\theta}_i$ corresponds to $y_i$
- $\Theta$ corresponds to $Y$
- The parameter space $\Pi := \set{\text{all possible }\pi\text{'s}}$ corresponds to $\Phi$
It is silly to limit $\Theta$ to the set of $K$ elements. In the next chapter, we will extend $\Theta$ to the set of all possible $\theta$'s (that are compatible with $F$).