Part 4b : Approximate inference in bayesian networks

Created by

Merel DJ

Cards (26)

For many real-world problems, exact inference in impossible :
a solution : look for ways to compute approximate the tractable computational effort
Optimisation-based Methods (Variational Inference)
Stochastic Sampling Methods (Particle-based Methods)
Approximate Inference via Stochastic Sampling
approximate P(X) via a limited set of points randomly drawn from P
(random) samples/particles
estimate any desired probability P(x) from these samples
consistent 
A sampling aglortihm that satisfies
In the limit (inf) the Empirical Distribution $\hat P$ should converge to the Target Distribution P
Sample a Single Variable :
Unifrom distribution --> standard pesudo-random number generator
Discrete (categorical) distribution
Discrete (categorical) distribution
partition [0,1] into k subintervals with lengths proportional to a certain probabilities
generate a random real numer s uniformly from the interval
if is in the $i^{th}$ inveral, then the sampled value is $x^i$
Forward Sampling Algorithm :
Choose a topological ordering X of the variables
For i=1 to N : sample a value $x_i$ from $Val(X_i)$ according to the condition distribution $P(X_i | pa(X_i))$ given the parent values sampled previously
Forward sampling is consistent 
The probability distrbution P according to which the samples are generated is equal to the true distribution P represented by the bayesian network
Generating oen random sample can be in in $O(nk log(d))$
How to estimate the probability P(x1, ..,xn) of an atomic event (x1, .. xn)
Sample a set D of M evenets randomly via Forward Samplling
Count the number of times K that our event match in D
estimate P beta via its relvative frequence K/M
Rejection sampling :
given : Bayesian network, a query
sample : probabilistically sample a set D of M events
Filter/Reject : discard all samples not compatable with the evidence
estimate : for each value combination, count the number K of samples in D that match x
$\hat P (X|e) =$ $\frac{K}{|D_e|}$
Rejection sampling
+
extremely simple
consistent
-
wasteful
Problem with naive likelihood weighting : true probability in certain situations is much higher. Naive algorithm is unare of this
Sample LW algorithm :
compute : A weighted sample $<(y,e),w>$
algorithm :
choose a topological ordering
intilisate weight w of current sample to w = 1.0
For i =1 to N :
if Xi is one of the evidence variables :
set Xi to its given evidence value
update the weight
else : sample a value for X from Val(X_i) according to P
Likelihood weighting
+
consisten
less wastefull than rejection sampling
-
evidence affects only samples below
many samples wil be nearly irrelevant
Importance sampling:
P target Distribution
Q Proposal Distribution
Importance sampling : computing the weight
w(x_i) must correct for the fact that x_i was drawn from the distribution Q
$w(x_i) =$ $\frac{P(x_i)}{Q(x_i)}$
Mulitaled Network $B_{E=e}$
variables $E_i$ have no parents in $B_{E=e}$
The CPD of each $E_i$ in $B_{E=e}$ gives probability 1.0 $E_i =$ $e_i$ and probability 0 to all other $e'_i \in Val(E_i)$
The parents and CPDs of all other variables $X \not \in E$ are as in B
Markov Chain Monte Carlo (MCMC) methods :
sample directly from
constuct a seuqneces of samples
with xi drawn from some proposal distrbution
do this such that Qi+1 is closer to the target then Qi
--> needs to converge to our target distribution
Gibbs Sampling :
Start with sample $x^{(0)} =$ $(y^{(0)},e)$ draw from the mutilated network
iteratively try to imporve the current sample by iterating over each non-evidence variable an resampling a value for it, given the current values of all the other variables
Markov Blanket Mb(A)
the set of variables constigin of A' s parents, its children, and its children's other parents
The gibbs resampling distribution
$P(Y_i | x_{-y_i}) =$ $P(Y_i | mb(Y_i))$
A Markov Chain is a model of discrete time probabilistic process that gives a set of possible states at any time t. Fulfils the Markov property
Markov chain is defined by a state space and a transition model
Metro-polis-Hastings is a general alogrithm for constructing Markov chains for any desired Stationary Distribution
Burn-in Time 
number of steps T until we collect a sample from the chain correlated to the mixing time