As a running example, we will consider the problem of mixture modeling. Let's consider the parametric case first.

Finite Mixture Models

Suppose we have data $x_1, \dots, x_N$ and we want to group them into clusters. Each cluster $j$ has distribution $F(\theta_j)$ . For example, $F$ can be multivariate Gaussian, and $\theta_j$ will be $(\mu_j, \Sigma_j)$ .

Suppose we know the number of clusters $K$ . There are 2 ways to model the mixture.

View 1

Let $z_i \in \set{1,\dots,K}$ . To sample $x_1, \dots, x_N$ ,

$z_i \sim \Mr{Multi}(\pi) = \Mr{Multi}(\pi_1, \dots, \pi_K)$
$x_i \sim F(\theta_{z_i})$

The priors are

$\theta_j \sim H(\lambda)$ for some conjugate prior $H$ of $F$
$\pi \sim \Mr{Dir}(\alpha_1, \dots, \alpha_K)$ . Usually, we set $\alpha_i = \alpha_0 / K$ .

View 2

Let $\Theta = \set{\theta_1, \dots, \theta_K}$ be the parameter space for $x$ . Let $G$ be a distribution on $\Theta$ defined as

$\begin{aligned} G(\theta) &= \sum_j \pi_j \delta(\theta, \theta_j) \\ &= \pi_j \text{ such that } (\theta = \theta_j) \end{aligned}$

where $\pi \sim \Mr{Dir}(\alpha_1, \dots, \alpha_K)$ and $\theta_j \sim H(\lambda)$ .

Let $\bar{\theta}$ be the draws (or the samples) from $G$ . Then to sample $x_1, \dots, x_N$ , we sample $\bar{\theta}_i \sim G$ and $x_i \sim F(\bar{\theta}_i)$ . The connection to View 1 is that $\bar{\theta}_i = \theta_{z_i}$ .

Connection with De Finetti's Theorem:

$\bar{\theta}_i$ corresponds to $y_i$
$\Theta$ corresponds to $Y$
The parameter space $\Pi := \set{\text{all possible }\pi\text{'s}}$ corresponds to $\Phi$

It is silly to limit $\Theta$ to the set of $K$ elements. In the next chapter, we will extend $\Theta$ to the set of all possible $\theta$ 's (that are compatible with $F$ ).