Approaching the EM algorithm
The EM method is based on the notion of Maximum Likelihood Estimation (MLE) of θ, an unknown parameter of the data distribution. Given a sample space X, let x ∈ X be an observation extracted from the density f (x | θ), which depends on the parameter θ; we define the likelihood function of θ, given the single observation x in the following function:
The likelihood function is a conditional probability function, taken as a function of its second argument, keeping the first argument fixed.
When the sample consists of n independent observations, then the likelihood becomes this:
Since the values of the likelihood are very close to 0 and to simplify the calculation of the derivative for the maximum likelihood estimates of the parameters, it is appropriate to transform the function with a logarithmic transformation and therefore to study what is called log likelihood: