Bayesian inference in dynamic models -- an overview
The following algorithms all try to infer the hidden state of a dynamic
model from measurements. The input is a dynamic model and a measurement
sequence and the output is an approximate posterior distribution over the
hidden state at one or many times. Only discrete-time models are discussed
here.
Inferring only the most recent hidden state is known as filtering;
inferring past states is known as smoothing.
Most filtering methods are on-line, which means they
process each measurement exactly once, after which it can be discarded.
This allows them to operate with a fixed amount of memory.
The opposite of on-line is off-line or batch.
There are standard ways to turn an on-line filtering algorithm into a
batch filtering or smoothing algorithm.
Therefore, the overview is divided into two parts: on-line filtering and
batch filtering/smoothing.
Some of these algorithms are general algorithms for approximate Bayesian
inference and others are specialized for dynamic models. With the
description of each algorithm is a partial list of references. I've
included more references for algorithms which are less
well-known.
Some related pages on the web:
On-line filtering algorithms
The algorithms are grouped according to how they represent the posterior
distribution over the hidden state (their belief).
Gaussian belief
The following algorithms use a multivariate Gaussian for their belief. In
fact, most of them are more general than this---they could use any
exponential family as their belief.
- Kalman filter
- The Kalman filter only applies to models with Gaussian noise, linear
state equations, and linear measurement equations, i.e.
s_t = A s_(t-1) + noise
x_t = C s_t + noise
For these models the state posterior really is Gaussian, and the Kalman
filter is exact.
- Extended Kalman filter
- The Extended Kalman filter applies to models with Gaussian noise.
The state and measurement equations are linearized by a Taylor expansion
about the current state estimate. The noise variance in the equations is
not changed, i.e. the additional error due to linearization is not modeled.
After linearization, the Kalman filter is applied.
- "Stochastic models, estimation and control", Peter S. Maybeck,
Volume 2, Chapter 12, 1982.
- "A linear approximation method for probabilistic inference",
Ross Shachter, UAI'1990.
- Bottleneck filter
- This algorithm applies to any type of measurement equation.
The measurement equation is rewritten in terms of an intermediate
bottleneck variable b_t, such that p(x_t|b_t) is
simple while p(b_t|s_t) may be complicated. At each time step,
the Gaussian belief on s_t is converted into a Gaussian belief
on b_t (usually involving approximations), b_t is
updated exactly from x_t (since p(x_t|b_t) is
simple), and the new Gaussian belief on b_t is converted back to
a Gaussian belief on s_t.
("Bottleneck" is my own terminology. In the West paper below, they
used Gamma distributions.)
- "Dynamic Generalized Linear Models and Bayesian Forecasting,"
M. West, P. J. Harrison, & H. S. Migon, J Am Stat Assoc 80:73-97, 1985.
- Linear-update filter
a.k.a. linear-regression filter or "statistical linearization" filter
-
This algorithm applies to any type of measurement equation.
The measurement equation is converted into a linear-Gaussian equation
by regressing the observation onto the state.
The result is a Kalman filter whose Kalman gain is
cov(state,measurement)/var(measurement).
Note that the regression is done without reference to the actual measurement.
I call it "linear-update" because the update to the state is always a linear
function of the measurement.
A variety of approximations have been proposed when
cov(state,measurement)
is not available analytically. The unscented filter,
central difference filter, and divided difference filter are
filters of this type.
- "Stochastic models, estimation and control", Peter S. Maybeck,
Volume 2, Chapter 12, 1982.
- "Kalman Filters for nonlinear systems: a comparison of performance",
Tine Lefebvre, Herman Bruyninckx, Joris De Schutter.
- "The Unscented Kalman Filter for Nonlinear Estimation",
Eric A. Wan and Rudolph van der Merwe, 2000.
- Assumed-density filter a.k.a. moment matching
- This algorithm chooses the Gaussian belief which is "closest", in some
sense, to the exact state posterior given previous beliefs.
Usually this amounts to matching the moments of the exact posterior.
This is the most general approximation technique proposed to date---it
utilizes not only the form of the measurement equation but also the
measurement itself. The assumed-density filter requires analytic or
numerical solution of integrals over the measurement distribution.
For this, one could use Monte Carlo, quadrature, or Laplace's method.
-
"Expectation Propagation for approximate Bayesian inference",
T. Minka, Uncertainty in AI'2001.
- "Stochastic models, estimation and control", Peter S. Maybeck,
Volume 2, Chapter 12.7, 1982.
-
"A nonlinear filtering algorithm based on an approximation of the conditional distribution",
H. J. Kushner and A. S. Budhiraja, IEEE Trans Automatic Control
45(3):580-585, 2000.
-
"Gaussian filters for nonlinear filtering problems",
K. Ito and K. Q. Xiong, IEEE Trans Automatic Control 45(5): 910-927, 2000.
-
"Approximate Learning in Complex Dynamic Bayesian Networks",
Settimi, Smith, & Gargoum, UAI'1999.
-
"A comparison of sequential learning methods for incomplete data",
R. G. Cowell, A. P. Dawid, & P. Sebastiani, Bayesian Statistics 5, 1996.
-
"A Bayesian Approach to On-line Learning",
Manfred Opper, On-Line Learning in Neural Networks, 1999.
Mixture of Gaussian belief
A natural choice in moving beyond a single Gaussian is to use a mixture of
Gaussian belief. Unfortunately, an algorithm for general dynamic
models has proven elusive. Instead, existing algorithms assume that the
dynamic model is a mixture of linear-Gaussian models, i.e. it switches
randomly between different linear-Gaussian state/measurement equations.
This can be understood as having discrete state variables in addition to
the continuous ones.
For these models, the true state posterior is a mixture of Gaussians, but a
very big one. The algorithms focus on reducing the size of this mixture,
in an on-line way.
Most of them generalize beyond Gaussian to any exponential family.
- Assumed-density filter
a.k.a. "pseudo-Bayes"
- Same as assumed-density filtering for a single Gaussian, but now the
belief representation is categorical for the discrete state component and
conditional Gaussian for the continuous state component, producing a
mixture of Gaussian marginal for the continuous state component. For each
setting of the discrete state component, this algorithm chooses the
Gaussian which is "closest" to the exact state posterior given previous
beliefs. A drawback of this algorithm is that the size of the mixture is
predetermined by the cardinality of the discrete state component. However,
it does allow the state/measurement equations, conditional on the discrete
state, to be non-Gaussian.
- Gaussian-sum filter
- This is a family of algorithms which propagate the exact posterior for
one step, giving a large Gaussian mixture, and then reduce the mixture
as needed. Methods for reducing the mixture include:
- Drop mixture components with low weight
-
- Sample mixture components according to weight
- Used in Rao-Blackwellised particle filters:
- Repeatedly merge most similar pair of mixture components
-
- Minimize divergence between the large and small mixture
by nonlinear optimization
-
Nonparametric belief
- Histogram filter
- Quantize the state space and represent the belief by a histogram.
The filtering equations then match a hidden Markov model.
-
"A general filter for measurements with any probability distribution",
Y. Rosenberg and M. Werman, CVPR'97, 654--659.
-
"Non-Gaussian state-space modeling of nonstationary time series",
G. Kitagawa, J Am Stat Assoc 82:1032--1063, 1987.
-
"Recursive Bayesian estimation using piecewise constant approximations",
Kramer and Sorenson, Automatica 24(6):789--801, 1988.
- Particle filter
-
Represent the state posterior by a large set of samples drawn from the
distribution. The particles are updated on-line to always represent the
current state posterior.
The most common way to do this is to pick a particle for the previous
state, sample a particle for the current state using the state equation,
and weight the particle by the probability of the measurement.
Sample the particles according to weight to get a set of unweighted
particles.
- Particle filter with interpolation
-
A particle filter where you increase diversity by fitting a density to
the particles and resampling from that density.
- Particle filter with MCMC steps
-
A particle filter where you increase diversity by including MCMC steps.
-
"Following a moving target---Bayesian inference for dynamic Bayesian models"
W. Gilks and C. Berzuini, J Royal Stat Soc B 63(1):127--146, 2001.
-
"Particle filters for state estimation of jump Markov linear systems"
-
"Sequential importance sampling for nonparametric Bayes models:
The next generation",
S. N. MacEachern, M. Clyde, J. S. Liu,
Canadian J of Statistics 27(2):251-267, 1999.
- MCMC filtering
- MCMC can be specially designed to provide efficient, bounded-memory
filtering, for example via randomized Gibbs sampling.
Batch filtering/smoothing algorithms
- Kalman smoothing
- Used with the Kalman filter, or any filter which linearizes the
state equations, e.g. EKF, UF, LUF, ADF.
- Expectation Propagation
- Provides batch filtering and smoothing.
Can be used with any method for linearizing the measurement equations,
e.g. EKF, UF, LUF, ADF. Unlike Kalman smoothing, the measurement equations
are re-linearized until a globally-stable solution is reached, giving
better results.
- Variational lower bounds
- Provides batch filtering and smoothing. Meshes well with parameter
estimation.
- Two-filter smoothing
- Run filtering forward and independently in reverse and combine the results.
Useful for Gaussian-sum filters.
-
"The two-filter formula for smoothing and an implementation of the
Gaussian-sum smoother",
G. Kitagawa, Annals Institute of Statistical Mathematics 46(4):605-623, 1994
- Particle smoothing by sampling
- Reweight and resample the particles at time t,
based on a sampled particle from time t+1.
-
"Monte Carlo smoothing with application to audio signal
enhancement",
W. Fong, S. Godsill, A. Doucet, & M. West,
IEEE Trans. on Signal Processing, to appear, 2001.
-
"Monte Carlo filter and smoother for non-Gaussian nonlinear state
space models", G. Kitagawa, J. of Computational and Graphical
Statistics 5:1-25, 1996
- Particle smoothing by interpolation
- Reweight and resample the particles
at time t, based on a fitted density to the particles at time
t+1.
- MCMC
- Gibbs sampling or Metropolis-Hastings.
- Markov Random Field algorithms
- Relaxation, etc.
Last modified: Thu Mar 22 14:48:18 GMT 2007