Bayesian inference, entropy, and the multinomial distribution

Thomas Minka
Online tutorial (revised 1/2/03)

Instead of maximum-likelihood or MAP, Bayesian inference encourages the use of predictive densities and evidence scores. This is illustrated in the context of the multinomial distribution, where predictive estimates are often used but rarely described as Bayesian. By using an entropy approximation to the evidence, many Bayesian quantities can be expressed in information-theoretic terms. For example, testing whether two samples come from the same distribution or testing whether two variables are independent boils down to a mutual information score (with appropriate smoothing). The same analysis can be applied to discrete Markov chains to get a test for Markovianity.

PDF

Last modified: Thu Jun 22 16:50:45 GMT 2006