Mixture Models

← Back to Outline

As a running example, we will consider the problem of mixture modeling. Let's consider the parametric case first.

Finite Mixture Models

Suppose we have data x1,,xNx_1, \dots, x_N and we want to group them into clusters. Each cluster jj has distribution F(θj)F(\theta_j). For example, FF can be multivariate Gaussian, and θj\theta_j will be (μj,Σj)(\mu_j, \Sigma_j).

Suppose we know the number of clusters KK. There are 2 ways to model the mixture.

View 1

π\pi
NN
KK
α\alpha
KK
ziz_i
xix_i
θj\theta_j
λ\lambda

Let zi{1,,K}z_i \in \set{1,\dots,K}. To sample x1,,xNx_1, \dots, x_N,

  • ziMulti(π)=Multi(π1,,πK)z_i \sim \Mr{Multi}(\pi) = \Mr{Multi}(\pi_1, \dots, \pi_K)
  • xiF(θzi)x_i \sim F(\theta_{z_i})

The priors are

  • θjH(λ)\theta_j \sim H(\lambda) for some conjugate prior HH of FF
  • πDir(α1,,αK)\pi \sim \Mr{Dir}(\alpha_1, \dots, \alpha_K). Usually, we set αi=α0/K\alpha_i = \alpha_0 / K.

View 2

GG
NN
HH
α\alpha
θˉi\bar\theta_i
xix_i

Let Θ={θ1,,θK}\Theta = \set{\theta_1, \dots, \theta_K} be the parameter space for xx. Let GG be a distribution on Θ\Theta defined as

G(θ)=jπjδ(θ,θj)=πj such that (θ=θj)\begin{aligned} G(\theta) &= \sum_j \pi_j \delta(\theta, \theta_j) \\ &= \pi_j \text{ such that } (\theta = \theta_j) \end{aligned}

where πDir(α1,,αK)\pi \sim \Mr{Dir}(\alpha_1, \dots, \alpha_K) and θjH(λ)\theta_j \sim H(\lambda).

Let θˉ\bar{\theta} be the draws (or the samples) from GG. Then to sample x1,,xNx_1, \dots, x_N, we sample θˉiG\bar{\theta}_i \sim G and xiF(θˉi)x_i \sim F(\bar{\theta}_i). The connection to View 1 is that θˉi=θzi\bar{\theta}_i = \theta_{z_i}.

Connection with De Finetti's Theorem:

  • θˉi\bar{\theta}_i corresponds to yiy_i
  • Θ\Theta corresponds to YY
  • The parameter space Π:={all possible π’s}\Pi := \set{\text{all possible }\pi\text{'s}} corresponds to Φ\Phi

It is silly to limit Θ\Theta to the set of KK elements. In the next chapter, we will extend Θ\Theta to the set of all possible θ\theta's (that are compatible with FF).

Exported: 2021-01-02T21:32:14.873900