As a running example, we will consider the problem of mixture modeling. Let's consider the parametric case first.
Finite Mixture Models
Suppose we have data x1,…,xN and we want to group them into clusters. Each cluster j has distribution F(θj). For example, F can be multivariate Gaussian, and θj will be (μj,Σj).
Suppose we know the number of clusters K. There are 2 ways to model the mixture.
View 1
Let zi∈{1,…,K}. To sample x1,…,xN,
zi∼Multi(π)=Multi(π1,…,πK)
xi∼F(θzi)
The priors are
θj∼H(λ) for some conjugate prior H of F
π∼Dir(α1,…,αK). Usually, we set αi=α0/K.
View 2
Let Θ={θ1,…,θK} be the parameter space for x. Let G be a distribution on Θ defined as
G(θ)=j∑πjδ(θ,θj)=πj such that (θ=θj)
where π∼Dir(α1,…,αK) and θj∼H(λ).
Let θˉ be the draws (or the samples) from G. Then to sample x1,…,xN, we sample θˉi∼G and xi∼F(θˉi). The connection to View 1 is that θˉi=θzi.
Connection with De Finetti's Theorem:
θˉi corresponds to yi
Θ corresponds to Y
The parameter space Π:={all possible π’s} corresponds to Φ
It is silly to limit Θ to the set of K elements. In the next chapter, we will extend Θ to the set of all possible θ's (that are compatible with F).