Bayesian inference in dynamic models -- an overview

by Tom Minka

The following algorithms all try to infer the hidden state of a dynamic model from measurements. The input is a dynamic model and a measurement sequence and the output is an approximate posterior distribution over the hidden state at one or many times. Only discrete-time models are discussed here.

Inferring only the most recent hidden state is known as filtering; inferring past states is known as smoothing. Most filtering methods are on-line, which means they process each measurement exactly once, after which it can be discarded. This allows them to operate with a fixed amount of memory. The opposite of on-line is off-line or batch. There are standard ways to turn an on-line filtering algorithm into a batch filtering or smoothing algorithm. Therefore, the overview is divided into two parts: on-line filtering and batch filtering/smoothing.

Some of these algorithms are general algorithms for approximate Bayesian inference and others are specialized for dynamic models. With the description of each algorithm is a partial list of references. I've included more references for algorithms which are less well-known.

Some related pages on the web:

Kevin Murphy's Kalman filter toolbox

On-line filtering algorithms

The algorithms are grouped according to how they represent the posterior distribution over the hidden state (their belief).

Gaussian belief

The following algorithms use a multivariate Gaussian for their belief. In fact, most of them are more general than this---they could use any exponential family as their belief.

Kalman filter

The Kalman filter only applies to models with Gaussian noise, linear state equations, and linear measurement equations, i.e.

s_t = A s_(t-1) + noise
x_t = C s_t + noise

For these models the state posterior really is Gaussian, and the Kalman filter is exact.

The Kalman filter
"From Hidden Markov Models to Linear Dynamical Systems", T. Minka, 1998

Extended Kalman filter

The Extended Kalman filter applies to models with Gaussian noise. The state and measurement equations are linearized by a Taylor expansion about the current state estimate. The noise variance in the equations is not changed, i.e. the additional error due to linearization is not modeled. After linearization, the Kalman filter is applied.

"Stochastic models, estimation and control", Peter S. Maybeck, Volume 2, Chapter 12, 1982.
"A linear approximation method for probabilistic inference", Ross Shachter, UAI'1990.

Bottleneck filter

This algorithm applies to any type of measurement equation. The measurement equation is rewritten in terms of an intermediate bottleneck variable b_t, such that p(x_t|b_t) is simple while p(b_t|s_t) may be complicated. At each time step, the Gaussian belief on s_t is converted into a Gaussian belief on b_t (usually involving approximations), b_t is updated exactly from x_t (since p(x_t|b_t) is simple), and the new Gaussian belief on b_t is converted back to a Gaussian belief on s_t. ("Bottleneck" is my own terminology. In the West paper below, they used Gamma distributions.)

"Dynamic Generalized Linear Models and Bayesian Forecasting," M. West, P. J. Harrison, & H. S. Migon, J Am Stat Assoc 80:73-97, 1985.

Linear-update filter a.k.a. linear-regression filter or "statistical linearization" filter

This algorithm applies to any type of measurement equation. The measurement equation is converted into a linear-Gaussian equation by regressing the observation onto the state. The result is a Kalman filter whose Kalman gain is cov(state,measurement)/var(measurement). Note that the regression is done without reference to the actual measurement. I call it "linear-update" because the update to the state is always a linear function of the measurement. A variety of approximations have been proposed when cov(state,measurement) is not available analytically. The unscented filter, central difference filter, and divided difference filter are filters of this type.

"Stochastic models, estimation and control", Peter S. Maybeck, Volume 2, Chapter 12, 1982.
"Kalman Filters for nonlinear systems: a comparison of performance", Tine Lefebvre, Herman Bruyninckx, Joris De Schutter.
"The Unscented Kalman Filter for Nonlinear Estimation", Eric A. Wan and Rudolph van der Merwe, 2000.

Assumed-density filter a.k.a. moment matching

This algorithm chooses the Gaussian belief which is "closest", in some sense, to the exact state posterior given previous beliefs. Usually this amounts to matching the moments of the exact posterior. This is the most general approximation technique proposed to date---it utilizes not only the form of the measurement equation but also the measurement itself. The assumed-density filter requires analytic or numerical solution of integrals over the measurement distribution. For this, one could use Monte Carlo, quadrature, or Laplace's method.

"Expectation Propagation for approximate Bayesian inference", T. Minka, Uncertainty in AI'2001.
"Stochastic models, estimation and control", Peter S. Maybeck, Volume 2, Chapter 12.7, 1982.
"A nonlinear filtering algorithm based on an approximation of the conditional distribution", H. J. Kushner and A. S. Budhiraja, IEEE Trans Automatic Control 45(3):580-585, 2000.
"Gaussian filters for nonlinear filtering problems", K. Ito and K. Q. Xiong, IEEE Trans Automatic Control 45(5): 910-927, 2000.
"Approximate Learning in Complex Dynamic Bayesian Networks", Settimi, Smith, & Gargoum, UAI'1999.
"A comparison of sequential learning methods for incomplete data", R. G. Cowell, A. P. Dawid, & P. Sebastiani, Bayesian Statistics 5, 1996.
"A Bayesian Approach to On-line Learning", Manfred Opper, On-Line Learning in Neural Networks, 1999.

Mixture of Gaussian belief

A natural choice in moving beyond a single Gaussian is to use a mixture of Gaussian belief. Unfortunately, an algorithm for general dynamic models has proven elusive. Instead, existing algorithms assume that the dynamic model is a mixture of linear-Gaussian models, i.e. it switches randomly between different linear-Gaussian state/measurement equations. This can be understood as having discrete state variables in addition to the continuous ones. For these models, the true state posterior is a mixture of Gaussians, but a very big one. The algorithms focus on reducing the size of this mixture, in an on-line way. Most of them generalize beyond Gaussian to any exponential family.

Assumed-density filter a.k.a. "pseudo-Bayes"

Same as assumed-density filtering for a single Gaussian, but now the belief representation is categorical for the discrete state component and conditional Gaussian for the continuous state component, producing a mixture of Gaussian marginal for the continuous state component. For each setting of the discrete state component, this algorithm chooses the Gaussian which is "closest" to the exact state posterior given previous beliefs. A drawback of this algorithm is that the size of the mixture is predetermined by the cardinality of the discrete state component. However, it does allow the state/measurement equations, conditional on the discrete state, to be non-Gaussian.

"Switching Kalman filters", Kevin Murphy, 1998.
"Bayesian forecasting", P. J. Harrison and C. F. Stevens, J Royal Stat Soc B 38:205--247, 1976.
"Expectation propagation for approximate inference in dynamic Bayesian networks", Tom Heskes and Onno Zoeter, Uncertainty in AI'2002.

Gaussian-sum filter

This is a family of algorithms which propagate the exact posterior for one step, giving a large Gaussian mixture, and then reduce the mixture as needed. Methods for reducing the mixture include:

Drop mixture components with low weight

"A comparison of sequential learning methods for incomplete data", R. G. Cowell, A. P. Dawid, & P. Sebastiani, Bayesian Statistics 5, 1996.
Tugnait, Automatica 18: 607--615, 1982
Smith & Winter, IEEE Conf Decision & Control, 1979

Sample mixture components according to weight

Used in Rao-Blackwellised particle filters:

"Mixture Kalman Filters", Chen and Liu
"Building Robust Simulation-based Filters for Evolving Data Sets", J. Carpenter, P. Clifford, & and P. Fearnhead

Repeatedly merge most similar pair of mixture components

"The two-filter formula for smoothing and an implementation of the Gaussian-sum smoother", G. Kitagawa, Annals Institute of Statistical Mathematics 46(4):605-623, 1994
"Bayesian Fault Detection and Diagnosis in Dynamic Systems", U. Lerner, R. Parr, D. Koller, & G. Biswas

Minimize divergence between the large and small mixture by nonlinear optimization

"Sequential methods for mixture models" M. Stephens, chapter 5 in "Bayesian Methods for Mixtures of Normal Distributions", 1997

Nonparametric belief

Histogram filter

Quantize the state space and represent the belief by a histogram. The filtering equations then match a hidden Markov model.

"A general filter for measurements with any probability distribution", Y. Rosenberg and M. Werman, CVPR'97, 654--659.
"Non-Gaussian state-space modeling of nonstationary time series", G. Kitagawa, J Am Stat Assoc 82:1032--1063, 1987.
"Recursive Bayesian estimation using piecewise constant approximations", Kramer and Sorenson, Automatica 24(6):789--801, 1988.

Particle filter

Represent the state posterior by a large set of samples drawn from the distribution. The particles are updated on-line to always represent the current state posterior. The most common way to do this is to pick a particle for the previous state, sample a particle for the current state using the state equation, and weight the particle by the probability of the measurement. Sample the particles according to weight to get a set of unweighted particles.

"Sequential Monte Carlo methods for Dynamic Systems", Liu and Chen, JASA 93, 1998.
Sequential Monte Carlo methods homepage
"Mixture Kalman Filters"

Particle filter with interpolation

A particle filter where you increase diversity by fitting a density to the particles and resampling from that density.

"A hybrid bootstrap filter for target tracking in clutter", N. Gordon, IEEE Trans Aerospace Electronic Systems 33:353-358, 1997.
"A Tutorial on Particle Filters for On-line Non-linear/Non-Gaussian Bayesian Tracking" (page 11)
"Using learning for approximation in stochastic processes"
"Factored Particles for Scalable Monitoring"

Particle filter with MCMC steps

A particle filter where you increase diversity by including MCMC steps.

"Following a moving target---Bayesian inference for dynamic Bayesian models" W. Gilks and C. Berzuini, J Royal Stat Soc B 63(1):127--146, 2001.
"Particle filters for state estimation of jump Markov linear systems"
"Sequential importance sampling for nonparametric Bayes models: The next generation", S. N. MacEachern, M. Clyde, J. S. Liu, Canadian J of Statistics 27(2):251-267, 1999.

MCMC filtering

MCMC can be specially designed to provide efficient, bounded-memory filtering, for example via randomized Gibbs sampling.

"Decayed MCMC Filtering"

Batch filtering/smoothing algorithms

Kalman smoothing

Used with the Kalman filter, or any filter which linearizes the state equations, e.g. EKF, UF, LUF, ADF.

The Kalman filter
"From Hidden Markov Models to Linear Dynamical Systems", T. Minka, 1998

Expectation Propagation

Provides batch filtering and smoothing. Can be used with any method for linearizing the measurement equations, e.g. EKF, UF, LUF, ADF. Unlike Kalman smoothing, the measurement equations are re-linearized until a globally-stable solution is reached, giving better results.

"Expectation Propagation for approximate Bayesian inference", T. Minka, Uncertainty in AI'2001.
"Expectation propagation for approximate inference in dynamic Bayesian networks", Tom Heskes and Onno Zoeter, Uncertainty in AI'2002.

Variational lower bounds

Provides batch filtering and smoothing. Meshes well with parameter estimation.

Two-filter smoothing

Run filtering forward and independently in reverse and combine the results. Useful for Gaussian-sum filters.

"The two-filter formula for smoothing and an implementation of the Gaussian-sum smoother", G. Kitagawa, Annals Institute of Statistical Mathematics 46(4):605-623, 1994

Particle smoothing by sampling

Reweight and resample the particles at time t, based on a sampled particle from time t+1.

"Monte Carlo smoothing with application to audio signal enhancement", W. Fong, S. Godsill, A. Doucet, & M. West, IEEE Trans. on Signal Processing, to appear, 2001.
"Monte Carlo filter and smoother for non-Gaussian nonlinear state space models", G. Kitagawa, J. of Computational and Graphical Statistics 5:1-25, 1996

Particle smoothing by interpolation

Reweight and resample the particles at time t, based on a fitted density to the particles at time t+1.

MCMC

Gibbs sampling or Metropolis-Hastings.

"Monte Carlo smoothing for non-linear time series"
"Bayesian forecasting of multinomial time series through conditionally Gaussian dynamic models", C. Cargnoni and P. Mueller and M. West, J. Amer Stat Assoc 92:587--606, 1997.
"Bayesian Estimation of State-Space Model Using the Metropolis-Hastings Algorithm within Gibbs Sampling", Geweke and Tanizaki, Computational Statistics and Data Analysis 37(2):151-170, 2001.

Markov Random Field algorithms

Relaxation, etc.

Last modified: Thu Mar 22 14:48:18 GMT 2007